The End-to-end Testing of asychronous systems
Estimated read time: 25 min
Technical level: Advanced
What you’ll learn: How to build hybrid automation plans in an asynchronous context.
Ideal profile(s):Tester, Developer, Automation Specialist
Author: Dorian Cransac (exense GmbH)
If you just google "asynchronous testing" or "event-driven testing", you'll find a lot of content that will teach you how to test code that is inherently asynchronous, such as in javascript applications and UIs in general. The purpose of this article is different however: what we're about to discuss here goes far beyond developer test suites. Instead, we're going to look at asynchronous systems from an end-to-end perspective and we will draw from experience gained across a variety of IT environments to examine the needs and challenges posed by these systems.
Context
Over the years, we’ve frequently come across scenarios in which a series of services are used by some sort of client, such as a web browser controlled by a human or a scheduled batch process, followed by the triggering of some asynchronous process. As explained in detail in our glossary, asynchronous processing introduces a set of issues which make building end-to-end simulations difficult.
When attempting to replicate the client’s behavior, end-to-end testers will want to actively poll some criterion or exit condition in order to find out about the end of the scenario. For instance, testers will want to wait for the appearance of a file in a folder or the presence of a new record in a database before verifying certain hypotheses. In other cases, such as cases involving callbacks, some code on the server’s side will actively push information back to the client and so, server & client roles will be “switched”, at least temporarily.
This last case can be particularly difficult to deal with. As the impersonator of the client, testing tools do not generally provide semantics for receiving requests or events. Implementation of service mocks can also be difficult and the integration layer between the mock and the test plan, very ugly.
In this article, we’re going to take an in-depth look at these different situations and we’ll make a case for a standard approach and an API in order to deal with these issues in a better way. If you’re not familiar with asynchronism, callbacks or events, you might want to take a quick look at this small glossary at the end of the article. It contains my own definition for each of these terms.
Real-world cases
Let’s look at a series of real-world asynchronous scenarios posing non-trivial challenges.
Server callback
The first case involves a callback. In this situation, client and server trade roles for a moment, meaning that at some point in the scenario, the initiator of the workflow is supposed to hang and wait for an incoming request from the server.
Illustrated in the figure above are a couple of synchronous calls made to a service bus (ESB), these could be HTTP calls or any sort of request-response protocol. After a response to sync_request_2 is sent, client and server roles are switched: the application server technically becomes a client and calls an endpoint on the client’s side (see the “Server endpoint” box on the figure).
This step of the scenario can be problematic for most E2E testing solutions as it requires for the simulated client:
- to wait for an unknown amount of time for the callback to take place
- to be able to receive the call (deploying a server context and waiting for incoming requests)
- to mock a specific business endpoint in order to return a meaningful response to the server
- to resume with the scenario upon reception of the request
Push notification
In this second situation, aynchronism is caused by the creation and reception of an event in the form of an Android Push notification. Notifications are pushed from the internal IT infrastructure to a third-party in the Internet public zone (Google Firebase servers), and then received by emulated Android devices in a third-party cloud system, also reachable over the public zone.
In order to monitor the propagation time of notifications in real-time during the tests, the most convenient way would be to build synchronicity back into the test plan by forwarding the reception of the notification by the device back to the initiator (Push Client) for validation and end-to-end performance measurements.
In addition to forwarding the notification message, a device identifier (or device token) needs to be sent from each device during the initialization phase of the test in order for the device to join the pool.
Again, this situation poses a certain number of technical challenges, as the simulation needs to:
- implement an Android mock or at least a Firebase mock to receive the actual notification
- forward the information that a notification has been received along with data
- allow for the arbitrary measurement of the entire propagation time
- provide semantics for passing both uniquely identifiable messages and pools of data
Note: If you’re interested in learning more on Android emulation in the cloud, FIDO2 authentication and Firebase, a comprehensive study covering this case has been published separately and is available here.
Two-way async protocol
This third situation revolves around bidirectional asynchronous communication using the FIX protocol. In the implementation encountered, messages were received on both sides using distinct event listeners. Once a channel is created between the Initiator and the Acceptor and a message has been sent, one must wait asynchronously in order to receive a response (or inbound message), meaning that the reception of subsequent inbound messages will take place in a different thread and method than the ones responsible for sending the outbound message.
The number of inbound messages received in response to a single emitted outbound message may vary and the order in which events are received is very important and needs to be validated as well. This makes building plans and measuring response times difficult. Deploying the code for testing is also a challenge.
The figure above illustrates the way two entities communicate using FIX. Ideally, an encompassing synthetic transaction would provide us with response times that make sense from a business perspective.
New problems arise in this case:
- messages need to be potentially queued upon reception to avoid stalling the system
- message order information needs to be kept for validation
- deploying and operating both the Initiator’s and Acceptor’s code can be difficult
- clear semantics matching the expected response messages and a synthetic transaction are needed
SSE & Websockets
This fourth and last example relates to a recent evolution of web technology: Server-sent Events (SSE) and Websockets. Server-sent events allow servers to push arbitrary events to a web application client in the browser and they’re more efficient at scale, as they replace the need for constantly polling the server and spare network traffic. SSE rely on traditional HTTP.
Websockets allow bidirectional communication between the browser and the server, but require full-duplex connections and the introduction of a new protocol in addition to HTTP.
Taking a look at the diagram above, we see the same pattern as in the previous cases. The client ends up having to potentially wait for multiple events and conceptually, it needs to behave as a server.
Trying to solve this scenario with a traditional HTTP-oriented toolkit would again prove to be difficult, due to the complexity of the request-response sequence and protocols involved.
Solutions
We will now go through some of the functionality which we thought needed to be added to step, exense’s automation platform, in order to address these situations in an elegant way. Then, we’ll circle back to each and every case to see the resulting test architecture.
Key functionality
The main idea we came up with is the separation of concern between the management of mocks (the implementation of business-specific service endpoints) and the responsibility of managing, coordinating and synchronizing events and threads. Central event management would now occur in a dedicated entity called EventBroker.
In order to interact between service mocks and the EventBroker, semantics and controls also need to be created and made available to the tester for use in test plans.
Functionality Overview
Here’s a summary of the requirements retained in the design of the new async packages:
Ordered collection with queue-like functionality and real-time monitoring | Events can be pushed, browsed, and retrieved via exactly-once semantics or via a “group” identifier | A server component running continuously and hosting services (REST / Webservice, Android app) |
Events can be manipulated via a remote interface for access by third-parties | Events carry a payload as well as their own timestamp information (submition, reception, etc) | Deployment of service mocks in the same way that test code is deployed (if possible) |
Event persistency is optional (nice-to-have) | Simplified Wait controls for synchronizing plans with the arrival of an Event | Scalable but single instance by default (this is essentially a bridge) |
Implementation
If you’re interested in checking out how the different requirements have been taken in account and implemented as functionality in step, please take a look at the following documentation page for our Async plugin.
Real-world solutions
Additional information as well as diagrams summarizing the architecture of the test setup in each situation are provided below.
Server callback
By developing and deploying an Adapter as a proxy between the server and the test platform and using the standard Event Broker API to convert incoming requests into step Events, clear separation of concern between the mock and the management of events was achieved.
Events can now be managed and monitored centrally through step’s controller, which eases the operational workload and simplifies analysis.
Push notification
After implementing the forwarding of the notification in the Android mock and adding a proxy instance to enable micro-polling from within the client’s internal network towards our cloud, the end-to-end measurement of the notification’s propagation time was made possible. Substracting the time between reception of the actual notification and the reception of the forwarded message proved to be difficult due to host time synchronization but payed off in the end as real-time monitoring was achieved.
Note: more details presenting the results of this test campaign are provided here.
Two-way async protocol
Using the EventBroker as a facade between Initiator and Acceptor along with the new Async controls not only allows tracking communications accurately but also enables the design of clear concurrent test plans for validating the sequence of messages received.
No standalone adapter was used in this scenario because the server code could be run in short, repeated stints via step’s Keyword API.
With both the listener’s ands adapter’s code packaged and deployed as Keywords, it’s easy to scale out sessions on step’s agent grid and implement multiple concurrent sessions:
SSE & Websockets
The SSE & Websocket cases are especially interesting as server-to-client communication is currently trending in the web world. While a protocol-oriented approach is technically possible in this scenario, we recommend the use of browser-based automation instead. As explained in this white paper on browser-based vs http-based automation, scenarios involving such a complex stack are much easier to deal with when running an actual browser instead of a mock.
This approach allows testers to deal with asynchronous patterns through a “black box” (the actual browser), and helps reducing the additional complexity and errors that a protocol-oriented solution would otherwise cause. This is what the simulation’s architecture ends up looking like in step:
In this case, with a real browser handling the reception of server events, there’s no need to use step’s EventBroker.
Although ending our series of real-world studies with a solution in which the Async controls and EventBroker were not used may seem like a contradiction, we believe it demonstrates the flexibility provided by step and highlights the fact that no single approach solves all problems. And while we like to make clear recommendations based on our experience, we’re pretty agnostic when it comes to E2E automation approaches. We believe it’s up to our users to decide what the best fit is in each of their projects.
Conclusion
After having worked on at least four separate instances of asynchronous cases, we are glad to have compiled our experience in the form of this case study and we are hoping that the information released here will be helpful to you and to others. Outside of the mere advertisement purposes for our proprietary plugin, with this article, we hope to bring some attention on an area of concern that is often disregarded or at least underestimated.
We’re also currently evaluating the introduction of new functionality as part of our Async & Event packages, such as Continuous Keywords for distributed mock management and server-style code deployment, a persistency option for the EventBroker’s queue as well as additional out-of-the-box monitoring functionality and information on the queue’s state and the event’s lifecycle.
Make sure to keep checking our Knowledge Base’s front page for future announcements, as we’re also planning on releasing more information, tutorials and demos on this topic. Until then, happy async testing!
Glossary
For better clarity, I’ve layed out my own definitions of popular terms related to asynchronous behavior in software.
Asynchronism
Asynchronism is the notion in interprocess communication (IPC) and concurrent programming, that two programs will be executing independently from eachother, i.e without waiting on one another, also known as blocking. If either one of the two programs never finishes or encounters a major issue, erroneous behavior will likely only be detected at later point of the program’s execution as opposed to the moment at which both programs were communicating.
This opposes synchronous execution, in which one program will be waiting for another program to reach the end of a routine and will inform the other about having completed (and will usually send along other information such as a status and payload). In such a scenario, if program 2 never completes, program 1 will eventually time out.
Asynchronism is used as a way to deal with uncertainty and build robustness into programs. The reason of the uncertainty may vary and, accordingly, so does the implementation of asynchronous mechanisms.
Following the client-server paradigm and the request-response model, many end-to-end automation tools are geared towards addressing the needs of synchronous workflows. This is why we thought it would be useful to cover the end-to-end testing of asynchronous systems in a dedicated article.
Callbacks and Promises
Asynchronous Callbacks and Promises, drawing primarly from the Javascript world, are similar in the sense that they are mechanisms used to plan the execution of code in the future. A program or thread may defer the execution of task to a later stage, with the garantee that proper action will be taken when execution eventually takes place.
Callbacks and Promises are a way to:
-
- deal with uncertainty (the duration of a given task)
-
- provide a chronological guarantee (the ordering of two tasks)
- and 3) pass bidirectional information between programs at the begining and end of the communication
Program 1 does not know how long it will take for program 2 to do what it has to do, so it won’t wait for it to complete, but instead, it will just tell it what it wants it to do when it’s done and the execution of the callback routine can be used as a way to inform Program 1 about the the status of the execution.
While the subtleties of Callbacks and Promises are out-of-scope in this article, let’s just note that they are frequently used as part of interprocess communication and are a typical way of introducing Asynchronism into a system.
Events
Events are also used for communication between processes or threads and the delegation of work. In an event-driven architecture, programs listen (i.e wait) for events which may or may not ever be received (i.e happen). Unlike Callbacks, in which the issue is related to time and the ordering of tasks, the uncertainty here is due to the program’s inability to predict the behavior of the source of events.
Events are a way to:
-
- deal with uncertainty (when should a task take place)
-
- pass information in a monodirectional way between programs
Program 2 will let Program 1 be in the driver seat, and will just make sure that the reception of each event is matched with the corresponding expected action. Without any additional means of communication, Program 2 won’t send information back to Program 1.