AI-Assisted Testing with the Step MCP Server

End-to-end walkthrough: author a test, run it, and analyse the results using plain English prompts in Claude via the Step MCP Server.

Get Step SaaS for free to follow this tutorial

Illustration for Playwright with TypeScript

See it in action

This tutorial shows how to go from “I want to test my website” to a full test run and root-cause analysis using only plain English prompts in Claude. No YAML editing, no CLI commands, no code.

Scenario: Run an end-to-end guest-checkout flow on the Step-hosted OpenCart storefront (https://opencart-prf.stepcloud.ch/), then dig into the results if anything fails.

Prerequisites

Step MCP Server installed and connected
Playwright MCP installed and connected
A Step cluster (cloud or on-premises) with a Node agent registered

Phase 1 — Authoring

Goal: Create an Automation Package that drives the OpenCart guest-checkout flow.

User’s input (prompt):

  - Create a Step automation package that completes a guest checkout for a MacBook on
https://opencart-prf.stepcloud.ch/  
- Use Node.js keywords with Playwright 
- Save it in ~/step-projects and call it opencart-checkout

AI’s output (example):

  Created `opencart-checkout` at `~/step-projects/opencart-checkout`.
The package declares 10 Node keywords covering the full guest-checkout flow:
navigate → add to cart → guest checkout → fill billing form → confirm shipping →
place order → verify confirmation.
Validation passed — structure is correct. Ready to execute.

Behind the scenes Claude calls step_initialize_project to scaffold the package, then step_validate_plan to verify it is well-formed.

Phase 2 — Execution

Goal: Run the test on the Step server.

User’s input (prompt):

  Run the opencart-checkout package on Step.

AI’s output (example):

  Execution started. Step returned execution ID `6a2fd525eba249060fc398cd`.
Ask me "what happened on 6a2fd525eba249060fc398cd?" once you're ready to see the results.

Note: step_execute_automation_package requires the step CLI to be installed and available on your PATH. Automation packages with Node.js keywords must run on a Step cluster — local execution is not supported. A Node agent must be registered on the cluster.

Behind the scenes Claude calls step_execute_automation_package to upload and launch the run on the Step server.

Phase 3 — Analysis

Quick overview

User’s input (prompt):

  What happened on 6a2fd525eba249060fc398cd?

AI’s output (example — failure):

  Execution `6a2fd525eba249060fc398cd` FAILED — step `fillBillingShippingForm` timed out.
Preview shows: `TimeoutError: locator.fill() exceeded 30 000 ms waiting for the country dropdown to be visible`. The billing form may have changed layout or the zone/country selectors are loading slowly.

Note: If everything passed, Claude reports success and you’re done. If there are failures, the overview returns the failing branches and an inline preview of up to 2 failed nodes — often enough to explain the problem right away.

Behind the scenes Claude calls step_fetch_execution_overview to retrieve a pass/fail summary with an inline preview of any failing nodes.

Drill into specific failures

If the overview isn’t enough (e.g. multiple keyword failures), ask for more detail:

User’s input (prompt):

  Show me more details on the fillBillingShippingForm failures from that execution.

AI’s output (example):

  All 3 failures hit the same `TimeoutError` on the country dropdown selector `#input-country`. The surrounding keywords passed, so the storefront itself is up — the selector likely changed in a recent OpenCart update.

Behind the scenes Claude calls step_fetch_execution_report_nodes to fetch the detailed error logs for the failing keyword.

Download log attachments

User’s input (prompt):

  Download the logs from execution 6a2fd525eba249060fc398cd.

AI’s output (example):

  Downloaded 2 attachments from execution `6a2fd525eba249060fc398cd`:
- ~/Downloads/trace-fillBillingShippingForm.zip (Playwright trace bundle)
- ~/Downloads/execution-6a2fd525.log (text log)
Open the trace in Playwright Trace Viewer: npx playwright show-trace ~/Downloads/trace-fillBillingShippingForm.zip

Note: All attachment files are downloaded from Step and saved to disk (Playwright trace bundles, HAR files, …) — the file path is returned.

Behind the scenes Claude calls step_download_attachment_content to retrieve the file from the execution report.

Search past executions

User’s input (prompt):

  Have there been other failed opencart-checkout executions in the last 7 days?

AI’s output (example):

  Found 3 failed executions of `opencart-checkout` in the last 7 days:
- 6a2fd525eba249060fc398cd — 2026-06-29 14:32 — fillBillingShippingForm timed out
- 4b1ec312dfa138050eb287bc — 2026-06-27 09:11 — fillBillingShippingForm timed out
- 9f3da401ceb026140dc174ab — 2026-06-25 16:48 — navigateToStore failed (connection reset)
All three share the same billing form issue — likely a persistent selector change.

Behind the scenes Claude calls step_search_executions to query past runs filtered by name, status, and time range.

Performance trends (load tests)

If you run a load test (useCase: load-testing) and want timing data:

User’s input (prompt):

  Show me the response-time trend for execution 6a2fd525eba249060fc398cd.

AI’s output (example):

  Response-time trend for execution `6a2fd525eba249060fc398cd` (30-minute load test):
- p50: 320 ms → 415 ms (gradual increase after minute 18)
- p95: 780 ms → 1 240 ms (spike starting at minute 22)
- p99: 1 100 ms → 2 850 ms
The rise in p95/p99 after minute 18 suggests the storefront begins to degrade under sustained load — likely a connection pool or database bottleneck.

Note: This tool returns data only for executions that emit time-series metrics (load tests, monitoring plans). For single-shot functional tests it returns no data — this is expected, not an error.

Behind the scenes Claude calls step_query_performance_metrics to fetch response-time percentiles over the execution timeline.

Healing a broken test

If analysis reveals a broken selector or outdated step, Claude can fix and re-run the test without leaving the conversation:

User’s input (prompt):

  The country dropdown selector changed. Find the correct one on the live page and fix the keyword.

AI’s output (example):

  Navigated to the billing form on https://opencart-prf.stepcloud.ch/. Found the updated selector: `#input-zone` replaced `#input-country` for the country dropdown.
Updated `fillBillingShippingForm.js` accordingly. Re-running the package now — execution ID `7c3ae601fbc350170gd409de`.

Behind the scenes Claude uses Playwright MCP to browse the live page, locate the updated element, and patch the keyword file directly. It then calls step_execute_automation_package to re-run.

Full prompt sequence at a glance

  1. Create a Step automation package that completes a guest checkout for a MacBook on
   https://opencart-prf.stepcloud.ch/. Use Node.js keywords with Playwright.
   Save it in ~/step-projects and call it opencart-checkout.

2. Run the opencart-checkout package on Step.

3. What happened on <execution-id>?

4. Show me more details on the failures from that execution.

5. Download the logs from execution <execution-id>.

6. Have there been other failed opencart-checkout executions in the last 7 days?

7. Show me the response-time trend for execution <execution-id>.

8. The country dropdown selector changed. Find the correct one on the live page and fix the keyword.

Tools used in this tutorial

Phase	Tool	Purpose
Authoring	step_initialize_project	Scaffold the Automation Package
Authoring	step_validate_plan	Verify the package structure
Execution	step_execute_automation_package	Launch the run on Step
Analysis	step_fetch_execution_overview	High-level pass/fail summary
Analysis	step_fetch_execution_report_nodes	Details on a specific failure type
Analysis	step_download_attachment_content	Retrieve logs and attachments
Analysis	step_search_executions	Find past runs by status or description
Analysis	step_query_performance_metrics	Response-time trends for load tests

For the full tool reference, see the Step MCP Server tools documentation.

Using Step with Grafana

This article demonstrates how to connect Grafana to data generated by Step.

Setting up system monitoring with a Step agent

This article demonstrates how to set up distributed system monitoring using Keyword executions, and analyze the results as measurements.

NET tutorials: Microsoft Office automation with Step

This tutorial demonstrates how to automate interaction with Microsoft Office applications using the Office Interop Assembly.

JUnit Plan Runner

This article provides documentation for how to integrate JUnit tests into Step.

How to monitor services availability and performance

This tutorial demonstrates how Step can be used to monitor services, availability and performance metrics.

.NET tutorials: AutoIt with Step

This tutorial demonstrates how to utilize the AutoIt C# binding to automate interactions with Windows applications.

Android Testing using Step and Appium

This article demonstrates the automation of mobile applications on Android using the Appium framework.

Browser-based automation with Step and Selenium

This article defines three Keywords which will be used in browser-based automation scenarios, using Step and Selenium, as general drivers.

Illustration for Load Testing with Cypress

Load Testing with Cypress using the Step UI - advanced

This tutorial shows you how to efficiently set up a browser-based load test using existing Cypress tests in the Step automation platform.

Setting up a Playwright TypeScript Project with Step

Set up a Playwright & TypeScript project with Step, create Keywords, test locally, and deploy for execution.

Adding and Configuring New Agents

In this short tutorial, we show how to quickly implement a simple browser-based load test based on Cypress scripts in Step.

Illustration for Load Testing with Playwright

Load Testing with Playwright using the Step UI

This tutorial shows you how to set up a browser-based load test using existing Playwright tests in the Step UI.

Basic Keyword Development

This article explains Keywords in Step and demonstrates how to create simple ones.

Designing functional tests

This tutorial demonstrates the design, execution, and analysis of functional tests using the web interface of Step.

Robotic Process Automation (RPA) with Selenium

This tutorial will demonstrate how to use Step and Selenium to automate various browser tasks.

Robotic Process Automation (RPA) with Cypress

This tutorial demonstrates how to use Step and Cypress to automate various browser tasks.

Synthetic Monitoring with Selenium

This tutorial demonstrates how Selenium automation tests can be turned into full synthetic monitoring using Step.

Load Testing with Cypress

In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the automation as code approach.

Load Testing with Serenity BDD and Cucumber

In this tutorial, you'll learn how to reuse existing tests written with Serenity BDD and Cucumber for load testing.

Synthetic Monitoring with Cypress

This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the automation as code approach.

Load Testing with Cypress using the Step UI

In this tutorial, you'll learn how to reuse existing Cypress tests to quickly set up and run a browser-based load test using the Step UI.

Load Testing with Selenium

This tutorial demonstrates how to leverage existing Selenium tests to set up and execute browser-based load tests, following a full code-based approach.

Load Testing with Selenium using the Step UI

This tutorial demonstrates how to set up a browser-based load test in the Step UI using existing Selenium tests.

Synthetic Monitoring with Playwright

This tutorial demonstrates how Playwright automation tests can be turned into full synthetic monitoring using Step.

Synthetic Monitoring with Cypress using the Step UI

This tutorial demonstrates how Cypress automation tests can be turned into full synthetic monitoring using the Step UI.

Robotic Process Automation (RPA) with Playwright

This tutorial will demonstrate how to use Step and Playwright to automate various browser tasks.

Distributed load testing with JMeter

This tutorial shows how to distribute JMeter tests across multiple nodes.

Load Testing with Playwright for Java

In this tutorial, you'll learn how to reuse existing Playwright tests written in Java to quickly set up and run a browser-based load test using the automation as code approach.