Prometheus | Step Documentation

Prometheus is a popular open-source monitoring and alerting system. The Step Prometheus plugin exposes measurements and metrics at a scrape endpoint that Prometheus polls at regular intervals.

In addition to Step execution data, the plugin also exposes JVM and Jetty metrics for internal controller monitoring. Note that while not directly related to this plugin, other Step components such as the Grid proxy and Java agents can also expose these metrics to Prometheus via their own configuration.

Enabling the plugin

The Prometheus plugin is enabled by default. No step.properties entry is required to activate it.

Configuring Prometheus

Install Prometheus following the official instructions. Add the Step scrape job to prometheus.yml:

scrape_configs:
  - job_name: 'YOUR_STEP_JOB_NAME'
    static_configs:
      - targets: ['CONTROLLER_URL']

Replace CONTROLLER_URL with the host and port of your Step controller (e.g. localhost:8080). The plugin exposes data at CONTROLLER_URL/metrics — the default metrics_path does not need to be changed.

The default scrape interval is 15 seconds and can be adjusted in prometheus.yml following the Prometheus documentation.

Native histograms

Measurements and metrics are exported as native Prometheus histograms in addition to the legacy histograms. Predefined bucket configuration is not supported — bucket boundaries are determined natively by Prometheus when using native histograms.

Native histogram support must be enabled on the Prometheus server to use this feature. Refer to the Prometheus feature flags documentation for the required server-side configuration.

Exported labels

The plugin exports two label sets per metric to balance query ergonomics with cardinality:

Standard labels — low-cardinality fields suitable for grouping and alerting: plan name, task name, measurement name, status, etc.
High-cardinality labels — additional context for drill-downs: execution ID, execution URL. These are available on a separate label set to avoid impacting Prometheus storage and query performance when not needed.

Once Step is running executions, the data is available at CONTROLLER_URL/metrics and visible in the Prometheus UI at PROMETHEUS_URL/graph.

Time-series cleanup

To prevent unbounded memory growth as execution data accumulates, the plugin periodically removes stale time-series from the in-memory metrics registry. The cleanup delay is measured from the last update to a given series:

Label set	Default cleanup delay	`step.properties` key
Standard labels (no execution ID)	600 s	`plugins.measurements.prometheus.base.cleanup.delay`
High-cardinality labels (per execution ID)	70 s	`plugins.measurements.prometheus.byExecutionId.cleanup.delay`

The shorter delay for high-cardinality series is intentional: once an execution ends its per-execution series will no longer be updated and can be removed quickly. The longer delay for standard series preserves aggregated data across executions for the duration of a typical Prometheus scrape cycle.

To override either value, add the corresponding property to step.properties with the desired delay in seconds.

Grafana dashboards

The following dashboards have been designed to use the Step metrics exposed to Prometheus:

Step — Execution report: dashboard designed to analyze the results of single execution (keywords response times, throughput…)
Step — Executions & Grid Overview: dashboard design for high level Step usage monitoring (executions durations and statuses, grid usage….)
Step — Execution Drilldown: dashboard designed to drilldown from executions overview providing with single execution results and direct links to individual executions

Enabling the plugin

Configuring Prometheus

Native histograms

Exported labels

Time-series cleanup

Grafana dashboards

See Also