Agent Sizing Guide
Agents are responsible for executing automation (automated test, load test, etc) through Keywords. Their resource requirements can vary significantly depending on the type of automation, the automation tools used, and the nature of the application under test. This guide provides practical recommendations to help you size agents effectively for reliable, stable, and performant execution.
For the general platform requirements and sizing recommendations, refer to the Requirements page.
Why Agent Sizing Matters
Properly sizing your agents is critical for ensuring predictable and efficient automation execution. Insufficiently provisioned agents may lead to:
- Slow executions and inaccurate performance measurements
- Unstable executions (e.g., timeouts, element resolution failures)
- Agent crashes due to system overload
Right-sizing your agents helps ensure that your automation run smoothly and predictably.
Key Factors Influencing Agent Sizing
Type of Automation
The nature of the automation significantly influences resource consumption.
-
API Automation:
API automation typically involves executing API clients (e.g., HTTP calls) and are considered lightweight. They demand minimal CPU and memory. -
UI Automation (Web, Mobile, Desktop Clients):
UI automation is resource-intensive because it requires launching and managing full-fledged clients (e.g., browsers, mobile device emulators, fat clients). The agent’s sizing is directly influenced by the client’s runtime behavior.
Recommendation: Use minimal sizing for pure API tests; allocate more generous resources for UI tests, especially if dealing with complex or heavy client applications.
Application Under Test (AUT)
The characteristics of the AUT, particularly its client, impact how much CPU and memory an agent consumes during execution.
-
Web Applications:
Heavy JavaScript-based SPAs (Single Page Applications) require significantly more CPU than simple static websites. -
Mobile/Desktop Clients:
These often demand additional system-level resources (GPU, memory, disk I/O) depending on their type, complexity and rendering needs.
Note: Step’s UI automation sizing recommendations assume a typical web application. Heavier clients may require custom adjustments.
Automation Tooling
Even with the same AUT and automation type, the automation tool or library used can affect agent sizing.
-
Web Automation:
For instance, Cypress and Selenium and the way they are integrated in Step lead to different runtime behaviors, resulting in varied CPU/memory footprints. -
API Automation:
For instance, using Step’s native HTTP Keywords (which execute within the agent process) is less demanding than invoking third-party tools like k6 or Postman CLI, which require spawning external processes.
Tip: Favor native integrations when possible to keep agent footprint smaller and easier to manage.
Parallelism
Number of Agent Tokens
Step agents support concurrent execution of Keywords, driven by the number of configured tokens. Each agent may define one or more token pools, each with n
tokens.
- A pool with 5 tokens can run up to 5 Keywords concurrently.
- This concurrency directly affects the required resources: more tokens = higher CPU/RAM needs.
Rule of Thumb: Multiply the resource needs for one token by the number of tokens to estimate the required capacity.
Internal Parallelism in Keywords
Some automation tools (e.g., k6, JMeter) support internal parallelism, meaning they can execute multiple virtual users (VUs) or threads concurrently within a single Keyword.
For example, if an agent has 3 tokens, and each Keyword triggers a scenario configured with 10 VUs, the effective concurrency becomes: 3 tokens × 10 VUs = 30 virtual users
This combined parallelism (token-level × tool-level) significantly increases the load on the agent. However, the actual resource impact depends heavily on how the tool implements internal parallelism.
Important: When sizing agents, you must account for both token-based and tool-based parallelism. In general, tool-level parallelization is more efficient, as it reduces the overhead of managing multiple tool instances.
Example
In a load testing scenario using Grafana k6, each instance of the k6 CLI introduces overhead.
- Running 10 tokens × 10 VUs = 10 separate k6 processes
- Running 1 token × 100 VUs = 1 k6 process
Although the total number of VUs is the same, the first scenario generates more system load due to the overhead of spawning and running 10 separate k6 processes.
This distinction is crucial when designing efficient load tests and is explicitly reflected in the recommended sizings provided below.
Sizing guide
In order to properly size your agents it is critical that you understand the key factors mentioned in the previous section.
To proceed with sizing, we recommend the following approach:
Step 1: Define the values for the 4 key factors
Before selecting a recommendation, identify the characteristics of your use case. The following table will help you collect the necessary information:
Key factor | Your Value | Example Values |
---|---|---|
Type of automation | UI or API | |
Client type | Standard web application, Heavy web application, Mobile, Desktop | |
Automation tool | Selenium, Playwright, Cypress, k6, JMeter, etc | |
Parallelism | 1 to n |
Step 2: Use the Reference Recommendations
Once you’ve identified your parameters, use the table below to find the closest sizing recommendation. These are based on the official Requirements and represent validated defaults for the most common scenarios.
Type of automation | Client type | Automation tool | Parallelism | Memory | CPU | Note | Corresponding flavour |
---|---|---|---|---|---|---|---|
UI | Standard web application | Selenium, Playwright | 1 | 1800Mi | 1750m | ui-standard | |
UI | Standard web application | Cypress | 1 | 1800Mi | 1750m | ui-standard | |
API | N/A | Grafana k6 | 100 | 1800Mi | 1750m | Tested with: |
ui-standard |
API | N/A | Native HTTP Keywords | 100 | 1800Mi | 1750m | api-standard |
Disclaimer: These values serve as a starting point. Depending on your environment and automation specifics, further tuning may be needed.
Step 3: Adapt the Sizing to Your Context
If your parameters differ from the examples above, adjust accordingly:
- Heavier Clients (e.g., fat desktop apps or mobile emulators): Increase memory and CPU beyond the default UI recommendation.
- Higher Parallelism: Multiply the CPU and memory values proportionally. E.g., if the baseline is for 1 token, multiply by 5 for 5 tokens.
Step 4: Validate the Sizing in Practice
To ensure your sizing is adequate and resilient, run a sequence of baseline and stress tests:
- Baseline Test (oversized agent): Run your scenario without parallelism for ~100 iterations to establish a stable reference.
- Baseline Test (sized agent): Re-run the same scenario on your actual planned agent sizing.
- Parallelism Test (target setup): Run your scenario with the target parallelism on the planned agent sizing (tokens + VUs).
Goal: All tests should complete successfully, with consistent performance and without crashes or throttling.
How to Monitor Agent Resource Usage
Once your agents are running in a production environment, continuous monitoring is essential to ensure they remain healthy, performant, and stable over time.
What to Monitor
Focus on the following key system metrics for each agent:
-
CPU Usage:
Track average and peak CPU usage over time. Sustained high usage may indicate under-provisioning or excessive parallelism. -
CPU Throttling (Kubernetes environments):
Monitor for CPU throttling to detect when the agent is being limited due to exceeding its CPU quota. This can lead to degraded performance or timeouts. -
Memory Usage:
Observe memory consumption trends. Sudden spikes or consistently high usage may result in OOM (Out-of-Memory) terminations or instability during test execution.
Tip: Use Kubernetes-native tools (e.g., Prometheus, Grafana, Lens) or your cloud provider’s monitoring suite to visualize and track these metrics.
Configure Alerts
Set up alerts to proactively catch issues before they impact test runs. Suggested thresholds:
Metric | Suggested Alert Threshold |
---|---|
CPU usage | > 80% sustained usage over 5–10 min |
CPU throttling | Any throttling consistently observed |
Memory usage | > 80–90% of provisioned limit over time |
Note: Thresholds should be tuned based on your workloads and empirical performance data during validation and early production runs.