Analytics and external DBs
Introduction
If you want to perform advanced analytics or attach a dashboarding product (such as Grafana or Splunk) to visualize data generated by step, in the enterprise version of step we provide plugins that write the data to databases other than MongoDB.
At this moment, the following plugins are available:
- JDBC (SQL)
- Prometheus
- Logback
In addition to these 3 plugins, step contains also an RTM plugin using the RTM measurements accessor and an Abstract Measurements plugin, responsible for processing generated measurements and sending them to the relevant exposing plugin.
In all cases, the step user is responsible for setting up and maintaining the external database of choice. Below we provide details on required configuration on the step side.
Installation and usage
Plugins can be enabled by uncommenting relevant lines in step.properties
of step Controller (CONTROLLER_DIR\conf\step.properties
). By default, the RTM plugin is enabled, and other plugins have following configuration:
plugins.JdbcControllerPlugin.enabled=false
plugins.PrometheusControllerPlugin.enabled=true
plugins.LogMeasurementControllerPlugin.enabled=false
JDBC (SQL)
With the JDBC plugin step may write data in SQL-based databases, such as PostgreSQL or MySQL.
A note on exporting noSQL-like data into SQL databases. Since the keywords used in step can be highly customized, each measurement generated by step contains a set of required fields (in bold) and may contain arbitrary amount of extra fields:
{ begin: ... , eId: ... , name: ... , planId: ... , rnStatus: ... , taskId: ... , type: ... , value: ... , customField1: ... , customField2: ... , customField3: ... , customField4: ... , ... }
Since SQL databases are not designed for dynamic schema adjustments, the JDBC plugin deals with this by exposing the required fields as separate columns and additionally exposing the custom fields as a json and writing them into a json-type column. PostgreSQL and MySQL support json as valid types, but other SQL databases may not, therefore the option to expose json with the custom measurement data can be enabled or disabled separately.
The plugin has been tested with PostgreSQL 12 and MySQL 8.
Enabling the JDBC plugin
To use the plugin:
-
the user must add the JDBC driver library corresponding to its database to the
CONTROLLER_DIR\lib
folder -
e.g. for PostgreSQL, download the relevant
.jar
toCONTROLLER_DIR\lib
folder -
enable the plugin in
step.properties
by setting
plugins.JdbcControllerPlugin.enabled=true
- configure the database url, user and password in
step.properties
, for example:
plugins.measurements.jdbc.url=jdbc:postgresql://localhost/step
plugins.measurements.jdbc.user=postgres
plugins.measurements.jdbc.password=test
Providing and configuration the database with proper indexing is the user responsibility, the table schema must be respected.
- optionally: enable writing of the custom fields in a json column:
plugins.measurements.jdbc.jsonColumn=true
PostgreSQL with TimescaleDB
For best performance, we recommend setting up PostgreSQL together with TimescaleDB. TimescaleDB is an open-source time-series database, available as an extension to PostgreSQL, and can be turned on and off on isolated tables in the database. TimescaleDB tables are called “hypertables”. Writing and reading data is done using normal SQL and in addition TimescaleDB provides extra functions facilitating working with time-series data. While it is not mandatory to have TimescaleDB installed in order to be able to ingest and use step data, time-series data stored in a TimescaleDB hypertable can be processed many times faster as compared to a normal PostgreSQL table.
The community edition of TimescaleDB is available with 2 different open-source licenses, please consult the official documentation of TimescaleDB to pick the best one for your needs.
Installation of TimescaleDB (optional)
Install PostgreSQL and TimescaleDB following official instructions. We recommend to run the automatic tuning tool by TimescaleDB.
After the installation, TimescaleDB needs to be separately enabled in the database by running a CREATE EXTENSION
command (see below).
Installation of t-digest (optional)
In order to be able to calculate percentiles quickly, we recommend using the t-digest algorithm. Its SQL implementation by Tomas Vondra is available as an open source extension to PostgreSQL.
To install t-digest extension on a Linux machine, simply clone the repository, go to the cloned directory, then run:
sudo apt-get install postgresql-server-dev-12
make
sudo make install
If you are using a different Postgres version than v12, adjust the first command accordingly.
To install t-digest extension on a Windows machine, the code needs to be first compiled to a dll library. Consult here and here for detailed instructions if you require to compile it yourself. Here for convenience we provide a pre-compiled dll for a 64-bit Windows 10 machine, although we cannot give 100% guarantee that it will work on your particular system. Once you have a compiled dll, in order to install the extension you have to manually:
- copy
tdigest.dll
toPOSTGRES_DIR\lib
- copy
tdigest.control
,tdigest--1.0.0.sql
,tdigest--1.0.0--1.0.1.sql
toPOSTGRES_DIR\share\extension
On all systems, after the installation the extension needs to be separately enabled in the database by invoking CREATE EXTENSION
command (see below).
Required schema
Create the database and enable extensions if needed:
CREATE database step;
\c step
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS tdigest;
Create the table which will hold measurements generated by step:
CREATE TABLE measurements (
begin timestamptz NOT NULL,
eId text,
status text,
planId text,
taskId text,
name text,
type text,
value integer,
info jsonb
);
For TimescaleDB to be used, this table needs to be explicitly converted to a hypertable:
SELECT create_hypertable('measurements', 'begin');
Depending on type of queries that are being executed, the user is free to decide to set up additional indexes on this table.
MySQL
Create the database:
CREATE database step;
USE step;
Create the table which will hold measurements generated by step:
CREATE TABLE measurements (
begin timestamp NOT NULL,
eId text,
status text,
planId text,
taskId text,
name text,
type text,
value integer,
info json
);
CREATE INDEX begin on measurements (begin);
CREATE INDEX begin_eId on measurements (eId(24),begin);
Prometheus
Prometheus is a popular open-source monitoring system. Prometheus step plugin exposes the generated data at an endpoint which is then scraped by Prometheus in regular time intervals. Both step and Prometheus need to be configured for the data export to work (see below).
In addition to the response times metrics, the plugin exposes the controller JVM and jetty metrics for internal/technical monitoring too.
Installing and setting up Prometheus
Install Prometheus following the official instructions.
In order to configure Prometheus to scrape step data, add the scraping job in prometheus.yml
in the scrape_configs
section:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'YOUR_STEP_JOB_NAME'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['CONTROLLER_URL']
Where CONTROLLER_URL
is the URL where your step controller is deployed, e.g. localhost:8080
. The URL needs to be enclosed in single-quotes. The step plugin exposes the data at the CONTROLLER_URL/metrics
endpoint, so you don’t have to change metrics_path
defaults.
By default, Prometheus scrapes the step endpoint every 15s. This setting may be changed in the prometheus.yml
file following the Prometheus documentation. The step plugin is not affected by this setting.
Enabling the plugin
Enable the plugin in step.properties
by setting
plugins.PrometheusControllerPlugin.enabled=true
To make Prometheus calculate percentiles, for each keyword a time-series of histograms is created. The histograms are created according to the list of predefined buckets for the measurement values. This list is controlled in step.properties
by the following parameter:
# use following option to overwrite the default buckets (duration are in seconds)
# default is 0.005D, 0.01D, 0.025D, 0.05D, 0.075D, 0.1D, 0.25D, 0.5D, 0.75D, 1.0D, 2.5D, 5.0D, 7.5D, 10.0D
plugins.measurements.prometheus.buckets=0.01D,0.025D,0.05D,0.1D,0.25D,0.5D,1.0D,2.5D,5D,10D,25D,50D
Here for example the first bucket is set for all measurement values below 10ms, the second for values between 10ms and 25ms and so on. Bucket granularity can be adjusted by the user as needed.
Once a new execution is running, you will be able to see the data exposed by step at the CONTROLLER_URL/metrics
endpoint, as well as see the ingested data at the Prometheus PROMETHEUS_URL/graph
endpoint.
Log
step can generate data in a log format compatible with Splunk. To enable the Logback plugin in step.properties
, enable the following property:
plugins.LogMeasurementControllerPlugin.enabled=true
The log configuration is controlled by the CONTROLLER_DIR\bin\logback.xml
file, which by default looks like the following:
<appender name="MEASUREMENTS" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>../log/measurement-logger.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>../log/measurement-logger-%i.log
</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>2</maxIndex>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>1GB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>{"timestamp":"%date","method":"%method","msg":%msg}%n</pattern>
</encoder>
</appender>
Log files are stored in the directory set inside the <file>
tag, by default CONTROLLER_DIR\log\measurement-logger.log
. This configuration may be tuned by the user as required.
Analyzing and visualizing data using Grafana
Data generated by step and written to external databases may be attached as data sources to other analytical or monitoring tools. We provide example Grafana dashboards, which allow for visualizing step data stored in MongoDB, PostgreSQL and Prometheus. Each of these dashboards contains a set of standard MongoDB / SQL / PromQL queries, which can be further re-used by the user as a starter to write more advanced queries.
See the tutorial for instructions on connecting Grafana to step data sources.