Prometheus
Prometheus is a popular open-source monitoring and alerting system. The Step Prometheus plugin exposes measurements and metrics at a scrape endpoint that Prometheus polls at regular intervals.
In addition to Step execution data, the plugin also exposes JVM and Jetty metrics for internal controller monitoring. Note that while not directly related to this plugin, other Step components such as the Grid proxy and Java agents can also expose these metrics to Prometheus via their own configuration.
Enabling the plugin
The Prometheus plugin is enabled by default. No step.properties entry is required to activate it.
Configuring Prometheus
Install Prometheus following the official instructions. Add the Step scrape job to prometheus.yml:
scrape_configs:
- job_name: 'YOUR_STEP_JOB_NAME'
static_configs:
- targets: ['CONTROLLER_URL']
Replace CONTROLLER_URL with the host and port of your Step controller (e.g. localhost:8080). The plugin exposes data at CONTROLLER_URL/metrics — the default metrics_path does not need to be changed.
The default scrape interval is 15 seconds and can be adjusted in prometheus.yml following the Prometheus documentation.
Native histograms
Measurements and metrics are exported as native Prometheus histograms in addition to the legacy histograms. Predefined bucket configuration is not supported — bucket boundaries are determined natively by Prometheus when using native histograms.
Native histogram support must be enabled on the Prometheus server to use this feature. Refer to the Prometheus feature flags documentation for the required server-side configuration.
Exported labels
The plugin exports two label sets per metric to balance query ergonomics with cardinality:
- Standard labels — low-cardinality fields suitable for grouping and alerting: plan name, task name, measurement name, status, etc.
- High-cardinality labels — additional context for drill-downs: execution ID, execution URL. These are available on a separate label set to avoid impacting Prometheus storage and query performance when not needed.
Once Step is running executions, the data is available at CONTROLLER_URL/metrics and visible in the Prometheus UI at PROMETHEUS_URL/graph.
Time-series cleanup
To prevent unbounded memory growth as execution data accumulates, the plugin periodically removes stale time-series from the in-memory metrics registry. The cleanup delay is measured from the last update to a given series:
| Label set | Default cleanup delay | step.properties key |
|---|---|---|
| Standard labels (no execution ID) | 600 s | plugins.measurements.prometheus.base.cleanup.delay |
| High-cardinality labels (per execution ID) | 70 s | plugins.measurements.prometheus.byExecutionId.cleanup.delay |
The shorter delay for high-cardinality series is intentional: once an execution ends its per-execution series will no longer be updated and can be removed quickly. The longer delay for standard series preserves aggregated data across executions for the duration of a typical Prometheus scrape cycle.
To override either value, add the corresponding property to step.properties with the desired delay in seconds.
Grafana dashboards
The following dashboards have been designed to use the Step metrics exposed to Prometheus:
- Step — Execution report: dashboard designed to analyze the results of single execution (keywords response times, throughput…)
- Step — Executions & Grid Overview: dashboard design for high level Step usage monitoring (executions durations and statuses, grid usage….)
- Step — Execution Drilldown: dashboard designed to drilldown from executions overview providing with single execution results and direct links to individual executions