AI-Assisted Testing with the Step MCP Server
End-to-end walkthrough: author a test, run it, and analyse the results using plain English prompts in Claude via the Step MCP Server.
Get Step SaaS for free to follow this tutorial
See it in action
This tutorial shows how to go from “I want to test my website” to a full test run and root-cause analysis using only plain English prompts in Claude. No YAML editing, no CLI commands, no code.
Scenario: Run an end-to-end guest-checkout flow on the Step-hosted OpenCart storefront (https://opencart-prf.stepcloud.ch/), then dig into the results if anything fails.
Prerequisites
- Step MCP Server installed and connected
- Playwright MCP installed and connected
- A Step cluster (cloud or on-premises) with a Node agent registered
Phase 1 — Authoring
Goal: Create an Automation Package that drives the OpenCart guest-checkout flow.
User’s input (prompt):
- Create a Step automation package that completes a guest checkout for a MacBook on
https://opencart-prf.stepcloud.ch/
- Use Node.js keywords with Playwright
- Save it in ~/step-projects and call it opencart-checkout
AI’s output (example):
Created `opencart-checkout` at `~/step-projects/opencart-checkout`.
The package declares 10 Node keywords covering the full guest-checkout flow:
navigate → add to cart → guest checkout → fill billing form → confirm shipping →
place order → verify confirmation.
Validation passed — structure is correct. Ready to execute.
step_initialize_project to scaffold the package, then step_validate_plan to verify it is well-formed.
Phase 2 — Execution
Goal: Run the test on the Step server.
User’s input (prompt):
Run the opencart-checkout package on Step.
AI’s output (example):
Execution started. Step returned execution ID `6a2fd525eba249060fc398cd`.
Ask me "what happened on 6a2fd525eba249060fc398cd?" once you're ready to see the results.
step_execute_automation_package requires the step CLI to be installed and available on your PATH. Automation packages with Node.js keywords must run on a Step cluster — local execution is not supported. A Node agent must be registered on the cluster.
step_execute_automation_package to upload and launch the run on the Step server.
Phase 3 — Analysis
Quick overview
User’s input (prompt):
What happened on 6a2fd525eba249060fc398cd?
AI’s output (example — failure):
Execution `6a2fd525eba249060fc398cd` FAILED — step `fillBillingShippingForm` timed out.
Preview shows: `TimeoutError: locator.fill() exceeded 30 000 ms waiting for the country dropdown to be visible`. The billing form may have changed layout or the zone/country selectors are loading slowly.
step_fetch_execution_overview to retrieve a pass/fail summary with an inline preview of any failing nodes.
Drill into specific failures
If the overview isn’t enough (e.g. multiple keyword failures), ask for more detail:
User’s input (prompt):
Show me more details on the fillBillingShippingForm failures from that execution.
AI’s output (example):
All 3 failures hit the same `TimeoutError` on the country dropdown selector `#input-country`. The surrounding keywords passed, so the storefront itself is up — the selector likely changed in a recent OpenCart update.
step_fetch_execution_report_nodes to fetch the detailed error logs for the failing keyword.
Download log attachments
User’s input (prompt):
Download the logs from execution 6a2fd525eba249060fc398cd.
AI’s output (example):
Downloaded 2 attachments from execution `6a2fd525eba249060fc398cd`:
- ~/Downloads/trace-fillBillingShippingForm.zip (Playwright trace bundle)
- ~/Downloads/execution-6a2fd525.log (text log)
Open the trace in Playwright Trace Viewer: npx playwright show-trace ~/Downloads/trace-fillBillingShippingForm.zip
step_download_attachment_content to retrieve the file from the execution report.
Search past executions
User’s input (prompt):
Have there been other failed opencart-checkout executions in the last 7 days?
AI’s output (example):
Found 3 failed executions of `opencart-checkout` in the last 7 days:
- 6a2fd525eba249060fc398cd — 2026-06-29 14:32 — fillBillingShippingForm timed out
- 4b1ec312dfa138050eb287bc — 2026-06-27 09:11 — fillBillingShippingForm timed out
- 9f3da401ceb026140dc174ab — 2026-06-25 16:48 — navigateToStore failed (connection reset)
All three share the same billing form issue — likely a persistent selector change.
step_search_executions to query past runs filtered by name, status, and time range.
Performance trends (load tests)
If you run a load test (useCase: load-testing) and want timing data:
User’s input (prompt):
Show me the response-time trend for execution 6a2fd525eba249060fc398cd.
AI’s output (example):
Response-time trend for execution `6a2fd525eba249060fc398cd` (30-minute load test):
- p50: 320 ms → 415 ms (gradual increase after minute 18)
- p95: 780 ms → 1 240 ms (spike starting at minute 22)
- p99: 1 100 ms → 2 850 ms
The rise in p95/p99 after minute 18 suggests the storefront begins to degrade under sustained load — likely a connection pool or database bottleneck.
step_query_performance_metrics to fetch response-time percentiles over the execution timeline.
Healing a broken test
If analysis reveals a broken selector or outdated step, Claude can fix and re-run the test without leaving the conversation:
User’s input (prompt):
The country dropdown selector changed. Find the correct one on the live page and fix the keyword.
AI’s output (example):
Navigated to the billing form on https://opencart-prf.stepcloud.ch/. Found the updated selector: `#input-zone` replaced `#input-country` for the country dropdown.
Updated `fillBillingShippingForm.js` accordingly. Re-running the package now — execution ID `7c3ae601fbc350170gd409de`.
step_execute_automation_package to re-run.
Full prompt sequence at a glance
1. Create a Step automation package that completes a guest checkout for a MacBook on
https://opencart-prf.stepcloud.ch/. Use Node.js keywords with Playwright.
Save it in ~/step-projects and call it opencart-checkout.
2. Run the opencart-checkout package on Step.
3. What happened on <execution-id>?
4. Show me more details on the failures from that execution.
5. Download the logs from execution <execution-id>.
6. Have there been other failed opencart-checkout executions in the last 7 days?
7. Show me the response-time trend for execution <execution-id>.
8. The country dropdown selector changed. Find the correct one on the live page and fix the keyword.
Tools used in this tutorial
| Phase | Tool | Purpose |
|---|---|---|
| Authoring | step_initialize_project | Scaffold the Automation Package |
| Authoring | step_validate_plan | Verify the package structure |
| Execution | step_execute_automation_package | Launch the run on Step |
| Analysis | step_fetch_execution_overview | High-level pass/fail summary |
| Analysis | step_fetch_execution_report_nodes | Details on a specific failure type |
| Analysis | step_download_attachment_content | Retrieve logs and attachments |
| Analysis | step_search_executions | Find past runs by status or description |
| Analysis | step_query_performance_metrics | Response-time trends for load tests |
For the full tool reference, see the Step MCP Server tools documentation.
This article demonstrates how to connect Grafana to data generated by Step.
This article demonstrates how to set up distributed system monitoring using Keyword executions, and analyze the results as measurements.
This tutorial demonstrates how to automate interaction with Microsoft Office applications using the Office Interop Assembly.
This article provides documentation for how to integrate JUnit tests into Step.
This tutorial demonstrates how Step can be used to monitor services, availability and performance metrics.
This tutorial demonstrates how to utilize the AutoIt C# binding to automate interactions with Windows applications.
This article demonstrates the automation of mobile applications on Android using the Appium framework.
This article defines three Keywords which will be used in browser-based automation scenarios, using Step and Selenium, as general drivers.
This tutorial shows you how to efficiently set up a browser-based load test using existing Cypress tests in the Step automation platform.
Set up a Playwright & TypeScript project with Step, create Keywords, test locally, and deploy for execution.
In this short tutorial, we show how to quickly implement a simple browser-based load test based on Cypress scripts in Step.
This tutorial shows you how to set up a browser-based load test using existing Playwright tests in the Step UI.
This article explains Keywords in Step and demonstrates how to create simple ones.
This tutorial demonstrates the design, execution, and analysis of functional tests using the web interface of Step.
This tutorial will demonstrate how to use Step and Selenium to automate various browser tasks.
This tutorial demonstrates how to use Step and Cypress to automate various browser tasks.
This tutorial demonstrates how Selenium automation tests can be turned into full synthetic monitoring using Step.
In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the automation as code approach.
In this tutorial, you'll learn how to reuse existing tests written with Serenity BDD and Cucumber for load testing.
This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the automation as code approach.
In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the Step UI.
This tutorial demonstrates how to leverage existing Selenium tests to set up and execute browser-based load tests, following a full code-based approach.
This tutorial demonstrates how to set up a browser-based load test in the Step UI using existing Selenium tests.
This tutorial demonstrates how Playwright automation tests can be turned into full synthetic monitoring using Step.
This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the Step UI.
This tutorial will demonstrate how to use Step and Playwright to automate various browser tasks.
This tutorial shows how to distribute JMeter tests across multiple nodes.
In this tutorial, you'll learn how to reuse existing Playwright tests written in Java to quickly set up and run a browser-based load test using the automation as code approach.
This tutorial demonstrates how Playwright tests can be reused for synthetic monitoring of a productive environment in a DevOps workflow
This tutorial shows how to distribute Grafana K6 tests across multiple nodes.
This tutorial demonstrates how Playwright tests can be reused for synthetic monitoring of a productive environment in a DevOps workflow
In this tutorial you'll learn how to quickly set up a protocol-based load test with okhttp
Learn how to set up continuous end-to-end testing across several applications based on Playwright tests in your DevOps pipeline using Step
Learn how to quickly set up continuous browser-based load testing using Playwright tests in your DevOps pipeline
Want to hear our latest updates about automation?
Don't miss out on our regular blog posts - Subscribe now!