AI-Assisted Testing with the Step MCP Server

End-to-end walkthrough: author a test, run it, and analyse the results using plain English prompts in Claude via the Step MCP Server.

Get Step SaaS for free to follow this tutorial
Illustration for Playwright with TypeScript

See it in action

This tutorial shows how to go from “I want to test my website” to a full test run and root-cause analysis using only plain English prompts in Claude. No YAML editing, no CLI commands, no code.

Scenario: Run an end-to-end guest-checkout flow on the Step-hosted OpenCart storefront (https://opencart-prf.stepcloud.ch/), then dig into the results if anything fails.

Prerequisites


Phase 1 — Authoring

Goal: Create an Automation Package that drives the OpenCart guest-checkout flow.

User’s input (prompt):

  - Create a Step automation package that completes a guest checkout for a MacBook on
https://opencart-prf.stepcloud.ch/  
- Use Node.js keywords with Playwright 
- Save it in ~/step-projects and call it opencart-checkout
  

AI’s output (example):

  Created `opencart-checkout` at `~/step-projects/opencart-checkout`.
The package declares 10 Node keywords covering the full guest-checkout flow:
navigate → add to cart → guest checkout → fill billing form → confirm shipping →
place order → verify confirmation.
Validation passed — structure is correct. Ready to execute.
  

Behind the scenes Claude calls step_initialize_project to scaffold the package, then step_validate_plan to verify it is well-formed.

Phase 2 — Execution

Goal: Run the test on the Step server.

User’s input (prompt):

  Run the opencart-checkout package on Step.
  

AI’s output (example):

  Execution started. Step returned execution ID `6a2fd525eba249060fc398cd`.
Ask me "what happened on 6a2fd525eba249060fc398cd?" once you're ready to see the results.
  

Note: step_execute_automation_package requires the step CLI to be installed and available on your PATH. Automation packages with Node.js keywords must run on a Step cluster — local execution is not supported. A Node agent must be registered on the cluster.
Behind the scenes Claude calls step_execute_automation_package to upload and launch the run on the Step server.

Phase 3 — Analysis

Quick overview

User’s input (prompt):

  What happened on 6a2fd525eba249060fc398cd?
  

AI’s output (example — failure):

  Execution `6a2fd525eba249060fc398cd` FAILED — step `fillBillingShippingForm` timed out.
Preview shows: `TimeoutError: locator.fill() exceeded 30 000 ms waiting for the country dropdown to be visible`. The billing form may have changed layout or the zone/country selectors are loading slowly.
  

Note: If everything passed, Claude reports success and you’re done. If there are failures, the overview returns the failing branches and an inline preview of up to 2 failed nodes — often enough to explain the problem right away.
Behind the scenes Claude calls step_fetch_execution_overview to retrieve a pass/fail summary with an inline preview of any failing nodes.

Drill into specific failures

If the overview isn’t enough (e.g. multiple keyword failures), ask for more detail:

User’s input (prompt):

  Show me more details on the fillBillingShippingForm failures from that execution.
  

AI’s output (example):

  All 3 failures hit the same `TimeoutError` on the country dropdown selector `#input-country`. The surrounding keywords passed, so the storefront itself is up — the selector likely changed in a recent OpenCart update.
  

Behind the scenes Claude calls step_fetch_execution_report_nodes to fetch the detailed error logs for the failing keyword.

Download log attachments

User’s input (prompt):

  Download the logs from execution 6a2fd525eba249060fc398cd.
  

AI’s output (example):

  Downloaded 2 attachments from execution `6a2fd525eba249060fc398cd`:
- ~/Downloads/trace-fillBillingShippingForm.zip (Playwright trace bundle)
- ~/Downloads/execution-6a2fd525.log (text log)
Open the trace in Playwright Trace Viewer: npx playwright show-trace ~/Downloads/trace-fillBillingShippingForm.zip
  

Note: All attachment files are downloaded from Step and saved to disk (Playwright trace bundles, HAR files, …) — the file path is returned.
Behind the scenes Claude calls step_download_attachment_content to retrieve the file from the execution report.

Search past executions

User’s input (prompt):

  Have there been other failed opencart-checkout executions in the last 7 days?
  

AI’s output (example):

  Found 3 failed executions of `opencart-checkout` in the last 7 days:
- 6a2fd525eba249060fc398cd — 2026-06-29 14:32 — fillBillingShippingForm timed out
- 4b1ec312dfa138050eb287bc — 2026-06-27 09:11 — fillBillingShippingForm timed out
- 9f3da401ceb026140dc174ab — 2026-06-25 16:48 — navigateToStore failed (connection reset)
All three share the same billing form issue — likely a persistent selector change.
  

Behind the scenes Claude calls step_search_executions to query past runs filtered by name, status, and time range.

If you run a load test (useCase: load-testing) and want timing data:

User’s input (prompt):

  Show me the response-time trend for execution 6a2fd525eba249060fc398cd.
  

AI’s output (example):

  Response-time trend for execution `6a2fd525eba249060fc398cd` (30-minute load test):
- p50: 320 ms → 415 ms (gradual increase after minute 18)
- p95: 780 ms → 1 240 ms (spike starting at minute 22)
- p99: 1 100 ms → 2 850 ms
The rise in p95/p99 after minute 18 suggests the storefront begins to degrade under sustained load — likely a connection pool or database bottleneck.
  

Note: This tool returns data only for executions that emit time-series metrics (load tests, monitoring plans). For single-shot functional tests it returns no data — this is expected, not an error.
Behind the scenes Claude calls step_query_performance_metrics to fetch response-time percentiles over the execution timeline.

Healing a broken test

If analysis reveals a broken selector or outdated step, Claude can fix and re-run the test without leaving the conversation:

User’s input (prompt):

  The country dropdown selector changed. Find the correct one on the live page and fix the keyword.
  

AI’s output (example):

  Navigated to the billing form on https://opencart-prf.stepcloud.ch/. Found the updated selector: `#input-zone` replaced `#input-country` for the country dropdown.
Updated `fillBillingShippingForm.js` accordingly. Re-running the package now — execution ID `7c3ae601fbc350170gd409de`.
  

Behind the scenes Claude uses Playwright MCP to browse the live page, locate the updated element, and patch the keyword file directly. It then calls step_execute_automation_package to re-run.

Full prompt sequence at a glance

  1. Create a Step automation package that completes a guest checkout for a MacBook on
   https://opencart-prf.stepcloud.ch/. Use Node.js keywords with Playwright.
   Save it in ~/step-projects and call it opencart-checkout.

2. Run the opencart-checkout package on Step.

3. What happened on <execution-id>?

4. Show me more details on the failures from that execution.

5. Download the logs from execution <execution-id>.

6. Have there been other failed opencart-checkout executions in the last 7 days?

7. Show me the response-time trend for execution <execution-id>.

8. The country dropdown selector changed. Find the correct one on the live page and fix the keyword.
  

Tools used in this tutorial

Phase Tool Purpose
Authoringstep_initialize_projectScaffold the Automation Package
Authoringstep_validate_planVerify the package structure
Executionstep_execute_automation_packageLaunch the run on Step
Analysisstep_fetch_execution_overviewHigh-level pass/fail summary
Analysisstep_fetch_execution_report_nodesDetails on a specific failure type
Analysisstep_download_attachment_contentRetrieve logs and attachments
Analysisstep_search_executionsFind past runs by status or description
Analysisstep_query_performance_metricsResponse-time trends for load tests

For the full tool reference, see the Step MCP Server tools documentation.

Illustration for Using Step with Grafana
Using Step with Grafana

This article demonstrates how to connect Grafana to data generated by Step.

Illustration for Setting up system monitoring with a Step agent
Setting up system monitoring with a Step agent

This article demonstrates how to set up distributed system monitoring using Keyword executions, and analyze the results as measurements.

Illustration for NET tutorials: Microsoft Office automation with Step
NET tutorials: Microsoft Office automation with Step

This tutorial demonstrates how to automate interaction with Microsoft Office applications using the Office Interop Assembly.

Illustration for JUnit Plan Runner
JUnit Plan Runner

This article provides documentation for how to integrate JUnit tests into Step.

Illustration for How to monitor services availability and performance
How to monitor services availability and performance

This tutorial demonstrates how Step can be used to monitor services, availability and performance metrics.

Illustration for .NET tutorials: AutoIt with Step
.NET tutorials: AutoIt with Step

This tutorial demonstrates how to utilize the AutoIt C# binding to automate interactions with Windows applications.

Illustration for Android Testing using Step and Appium
Android Testing using Step and Appium

This article demonstrates the automation of mobile applications on Android using the Appium framework.

Illustration for Browser-based automation with Step and Selenium
Browser-based automation with Step and Selenium

This article defines three Keywords which will be used in browser-based automation scenarios, using Step and Selenium, as general drivers.

Illustration for Load Testing with Cypress
Load Testing with Cypress using the Step UI - advanced

This tutorial shows you how to efficiently set up a browser-based load test using existing Cypress tests in the Step automation platform.

Illustration for Playwright with TypeScript
Setting up a Playwright TypeScript Project with Step

Set up a Playwright & TypeScript project with Step, create Keywords, test locally, and deploy for execution.

Illustration for Adding and Configuring New Agents
Adding and Configuring New Agents

In this short tutorial, we show how to quickly implement a simple browser-based load test based on Cypress scripts in Step.

Illustration for Load Testing with Playwright
Load Testing with Playwright using the Step UI

This tutorial shows you how to set up a browser-based load test using existing Playwright tests in the Step UI.

Illustration for Basic Keyword Development
Basic Keyword Development

This article explains Keywords in Step and demonstrates how to create simple ones.

Illustration for Designing functional tests
Designing functional tests

This tutorial demonstrates the design, execution, and analysis of functional tests using the web interface of Step.

Illustration for Robotic Process Automation (RPA) with Selenium
Robotic Process Automation (RPA) with Selenium

This tutorial will demonstrate how to use Step and Selenium to automate various browser tasks.

Illustration for Robotic Process Automation (RPA) with Cypress
Robotic Process Automation (RPA) with Cypress

This tutorial demonstrates how to use Step and Cypress to automate various browser tasks.

Illustration for Synthetic Monitoring with Selenium
Synthetic Monitoring with Selenium

This tutorial demonstrates how Selenium automation tests can be turned into full synthetic monitoring using Step.

Illustration for Load Testing with Cypress
Load Testing with Cypress

In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the automation as code approach.

Illustration for Load Testing with Cypress
Load Testing with Serenity BDD and Cucumber

In this tutorial, you'll learn how to reuse existing tests written with Serenity BDD and Cucumber for load testing.

Illustration for Synthetic Monitoring with Cypress
Synthetic Monitoring with Cypress

This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the automation as code approach.

Illustration for Load Testing with Cypress
Load Testing with Cypress using the Step UI

In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the Step UI.

Illustration for Load Testing with Selenium
Load Testing with Selenium

This tutorial demonstrates how to leverage existing Selenium tests to set up and execute browser-based load tests, following a full code-based approach.

Illustration for Load Testing with Selenium
Load Testing with Selenium using the Step UI

This tutorial demonstrates how to set up a browser-based load test in the Step UI using existing Selenium tests.

Illustration for Synthetic Monitoring with Playwright
Synthetic Monitoring with Playwright

This tutorial demonstrates how Playwright automation tests can be turned into full synthetic monitoring using Step.

Illustration for Synthetic Monitoring with Cypress
Synthetic Monitoring with Cypress using the Step UI

This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the Step UI.

Illustration for Robotic Process Automation (RPA) with Playwright
Robotic Process Automation (RPA) with Playwright

This tutorial will demonstrate how to use Step and Playwright to automate various browser tasks.

Illustration grafana devops tutorial
Distributed load testing with JMeter

This tutorial shows how to distribute JMeter tests across multiple nodes.

Illustration for Load Testing with Playwright
Load Testing with Playwright for Java

In this tutorial, you'll learn how to reuse existing Playwright tests written in Java to quickly set up and run a browser-based load test using the automation as code approach.

Illustration for playwright synthetic monitoring in a devops workflow
DevOps Synthetic Monitoring with Playwright - Advanced

This tutorial demonstrates how Playwright tests can be reused for synthetic monitoring of a productive environment in a DevOps workflow

Illustration grafana devops tutorial
Distributed load testing with Grafana K6

This tutorial shows how to distribute Grafana K6 tests across multiple nodes.

Illustration for playwright synthetic monitoring in a devops workflow
DevOps Synthetic Monitoring with Playwright

This tutorial demonstrates how Playwright tests can be reused for synthetic monitoring of a productive environment in a DevOps workflow

Illustration for okhttp devops
Protocol-based load testing with okhttp

In this tutorial you'll learn how to quickly set up a protocol-based load test with okhttp

Illustration for playwright devops
Continuous end-to-end testing

Learn how to set up continuous end-to-end testing across several applications based on Playwright tests in your DevOps pipeline using Step

Illustration for playwright devops
Continuous load testing with Playwright

Learn how to quickly set up continuous browser-based load testing using Playwright tests in your DevOps pipeline

Want to hear our latest updates about automation?

Don't miss out on our regular blog posts - Subscribe now!

Image of a laptop device to incentivize users to subscribe