In short

The Step AI Ecosystem extends the Software Automation as Code approach with autonomous software testing capabilities powered by large language models (LLMs) and AI agents. The Step platform unifies AI-driven functional testing, load testing, and monitoring while integrating seamlessly into existing CI/CD and DevOps workflows.

The Step AI Ecosystem is built on three layers:

Step MCP Server for connecting AI assistants and coding agents to Step
Reusable Testing Skills for packaging Step testing expertise into reusable workflows
Step Testing Agent for generating, executing, self-healing and validating tests on Step

Together, these capabilities enable agentic testing workflows that accelerate test creation, reduce maintenance effort, and improve automation reliability.

Our current initiatives focus on three complementary contributions, as summarized below.

Step AI Ecosystem architecture showing MCP Server, Testing Skills, and Testing Agent for autonomous software testing

Step MCP/CLI

The Step MCP Server enables AI assistants and coding agents like Claude Desktop and Claude Code to interact directly with the Step platform, bringing AI-powered software testing into existing automation workflows through the Model Context Protocol (MCP).

📹 Watch the demo

Built on top of the Step CLI and APIs, the MCP server exposes the tools required across the full testing lifecycle to AI assistants. Agents can design tests, execute them on remote Step agents, analyse results, inspect artefacts, and update automation. This enables self-healing workflows while leveraging the same Step execution engine used by traditional automation workloads.

Reusable Testing Skills

Reusable Testing Skills package the expertise of experienced automation engineers into standardized building blocks for AI test automation. AI agents can invoke these skills on demand to apply proven testing practices consistently across functional testing, load testing, and monitoring.

Each skill combines general testing know-how with Step-specific automation patterns and encodes a complete, opinionated workflow, from capturing requirements through implementation, validation, and packaging. This helps agents produce correct, consistent, and reliable automation.

By enforcing the same engineering discipline across every framework and system under test, Testing Skills turn hard-won automation and Step expertise into a reusable catalogue of building blocks anyone can apply.

Step Testing Agent

The Step Testing Agent is an autonomous AI testing agent that converts natural-language requirements into executable, validated, and self-healing test automation through a reproducible agentic workflow, producing a deployable Step Automation Package.

Give the agent a System Under Test (SUT), such as a browser-based application, plus a plain-English scenario, and it runs end to end unattended: it explores the app in a real browser, writes a structured test plan, discovers and verifies the UI selectors for each step, generates a runnable test suite, executes it, repairs failures, and proves the result is stable before handing it over.

The result is automation you can trust (see the Benchmarking & Validation of Agentic Testing section below). When the application changes and a test breaks, the agent reads the real browser failure, finds the cause, and heals the automation with no manual debugging.

Tests are not accepted on a single green run: the agent repeats the suite and only marks it stable when it passes consistently, filtering out flakiness before it reaches you. Every run delivers:

A readable test plan
An implemented keyword library
A coverage map linking each requirement to its test
Full metrics on cost, duration, and pass rates
A deployable Step Automation Package

Running on the Step automation platform, the current implementation combines the Claude Code SDK for reasoning, Playwright MCP for browser interaction, and Step for execution and orchestration. This approach accelerates test creation, supports self-healing, and reduces maintenance effort. Because each run is fully specified, results are reproducible and comparable, making them suitable for both rapid test authoring and ongoing regression coverage.

Benchmarking & Validation of Agentic Testing

Our AI initiatives are evaluated through reproducibility benchmarks, internal validation frameworks, partner demonstrations, and customer feedback.

📊 Read “Agentic Testing - Part 1: Reproducibility”

Current benchmarking results demonstrate reproducible outcomes alongside significant productivity gains. Generated test code has consistently shown high execution stability, strong fidelity to intended specifications, and predictable operational characteristics, demonstrating the practical applicability of agentic testing workflows.

Frequently Asked Questions

What is agentic testing?

Agentic testing uses AI agents to autonomously design, execute, validate, and maintain automated tests with minimal human intervention.

What is self-healing test automation?

Self-healing test automation automatically detects application changes, identifies failing selectors or workflows, and repairs test automation without manual debugging.

How does Step support AI-powered testing?

Step combines MCP integrations, reusable testing skills, AI agents, and its automation platform to support autonomous software testing across functional testing, load testing, and monitoring.

Can Step integrate with existing CI/CD pipelines?

Yes. Step AI capabilities are designed to integrate directly into existing CI/CD and DevOps workflows while leveraging the same execution infrastructure used by traditional automation.

Early Access

Interested in an early preview?

In short

Step MCP/CLI

Reusable Testing Skills

Step Testing Agent

Benchmarking & Validation of Agentic Testing

Frequently Asked Questions

Early Access

See Also