With Step version 24, a new rule-based mechanism is being introduced, which allows users to define flexible reactions to various events.

In a nutshell, rules allow you to define when to react to specific events (conditions), and how (actions).

The following illustration presents the process in a simplified, informal form:

Events

Events are automatically generated by the Step environment and fed into the alerting rules evaluation component. Events contain further details about what happened, in the form of Bindings. These bindings can be used during rule evaluation, e.g. to express conditions based on the value of individual bindings. Furthermore, the bindings are also available when actions are executed; for instance, a mail or webhook sent as a result of a rule can define a template of its content with placeholders that will be appropriately substituted with the concrete binding values.

Events class hierarchy and bindings

Events are structured in a class hierarchy, where individual subclasses add more specialized bindings depending on their context.

Only the leaf classes are actually instantiated (emitted) by the system, but they can conceptually be treated as any of their respective superclasses, and contain all their superclasses’ bindings.

Please note that for simplicity, only three types of bindings are supported: strings, lists of strings, and maps with string keys and string values.

Event class	Binding (Type)	Description
AlertingEvent		Base class of all alerting events
	eventClass (String)	Concrete class of the event
	eventClasses (List)	All classes of the event (including superclasses)
	controllerUrl (String)	URL of the Step controller
	projectId (String)	ID of the Step project in which the event occurred
	projectName (String)	Name of the Step project in which the event occurred
	eventSummary (String)	Short human-readable summary/title of the event
ExecutionEvent	(extends AlertingEvent)	Base class of execution events
	executionId (String)	ID of the execution
	executionDescription (String)	Description of the execution, e.g. plan name
	executionUrl (String)	URL of the execution
	executionUserName (String)	Step user who performed the execution
	executionParameters (Map)	Parameters of the execution
AbstractExecutionEndedEvent	(extends ExecutionEvent)	Base class of execution ended events
	executionStatus (String)	Execution status (e.g. PASSED, TECHNICAL_ERROR )
	errorSummary (String)	Human-readable error summary (if applicable)
	errorCodes (List)	Error codes (if applicable)
ExecutionEndedEvent	(extends AbstractExecutionEvent)	Regular execution ended (e.g. plan was executed) Triggered after the execution of a plan.
ScheduledExecutionEndedEvent	(extends AbstractExecutionEvent)	Scheduled execution ended Triggered after each execution of a schedule.
	scheduleId (String)	ID of the schedule that triggered the execution
	scheduleName (String)	Name of the schedule that triggered the execution
	scheduleStatus (String)	Status of the schedule (after this execution)
	scheduleSucceeded (String)	Boolean indicating whether schedule is considered successful (true/false)
	assertionPlanExecutionStatus (String)	Status of the assertion plan execution (if applicable)
	assertionPlanExecutionUrl (String)	URL of the assertion plan execution (if applicable)
	assertionPlanErrorSummary (String)	Error summary of the assertion plan execution (if applicable)
	assertionPlanErrorCodes (List)	Error codes of the assertion plan execution (if applicable)
IncidentEvent	(extends AlertingEvent)	Base class of incident events
	incidentId (String)	ID of the incident
	incidentUrl (String)	URL of the incident
	incidentStatus (String)	Status of the incident (OPEN/CLOSED)
	incidentTitle (String)	Incident title
	incidentCauseEventClass (String)	Concrete class of the event that caused the incident event
	incidentCauseEventClasses (List)	All classes of the event that caused the incident event
	See further notes in description below
IncidentOpenedEvent	(extends IncidentEvent)	An incident was opened Triggered after the opening of an incident.
IncidentClosedEvent	(extends IncidentEvent)	An incident was closed Triggered after the closing of an incident.
IncidentRecordedEvent	(extends IncidentEvent)	An already-open incident reoccurred Triggered after re-observing a related event.

Notes

IncidentEvent instances will also copy most of the bindings of their causing event. For instance, if an incident was opened in response to a ScheduledExecutionEndedEvent, it will also contain the scheduleName etc. bindings.
IncidentRecordedEvents are only informational and do not change the status of incidents. They are emitted when an incident would have been opened, but such an incident is already open. In this case, the occurrence is added as an informational entry to the existing incident.
After a schedule is executed, the system emits two events: one ExecutionEndedEvent related to the actual plan that was executed, and one ScheduledExecutionEndedEvent related to the schedule itself. These events do not necessarily have the same result status, because the schedule may be subject to the evaluation of an Assertion Plan which determines its status, independently of the status of the underlying plan.
The bindings can be evaluated in conditions, and actions. See the section on Binding Evaluation for more details.

Rules

As mentioned above, Step generates events automatically as they occur in the system. All events are fed into the alerting rules subsystem, but actions are only taken for events which specifically match the defined rules.

Here is an example of a project configured with two rules – one for automatically managing (i.e. opening/closing) incidents based on events emitted from scheduled executions, and one for sending notifications if incidents are opened.

The definition of the first rule is as shown below – as can be seen, it will react to ScheduledExecutionEndedEvents, and perform the action “Open/close incident automatically” as appropriate in response to these events.

Once the first rule is processed, it may in turn itself generate events related to incidents. These events are captured by the second rule, which reacts to IncidentEvents (except if the event is an IncidentRecordedEvent, see the description of conditions below), and actually sends a notification by mail.

Conditions

Every rule must be associated to at least an event class for which it will be evaluated. This can be any class in the hierarchy, but it is recommended to be as specific as possibly – in other words, use the most fitting subclass suitable for the task.

For example, a rule where the event class is set to the top-level AlertingEvent class will be evaluated for every single event that occurs. If the class is set to IncidentEvent, the rule will only be evaluated for incident-related events (remember that subclasses will also be matched); finally, if it is set to IncidentOpenedEvent, the rule will only be triggered on events where incidents were opened.

In addition to this required condition which matches the event class, an arbitrary number of further conditions may be specified.

For the time being, only one type of further condition is supported, but we plan to support more varied definitions in the future.

Binding conditions

In order to restrict a rule to matching only specific events (apart from the broad filtering by event class), the content of the bindings present in the event can be evaluated. To give a few examples:

If one only wishes to react to (regular) executions that were not successful, and ran in the “PROD” environment (which was specified using an execution parameter env), the conditions would (logically) be: rule.eventClass == 'ExecutionEndedEvent', executionStatus != 'PASSED', and executionParameters[env] == 'PROD'.
To further restrict rules to only match incident events which occur when an incident was opened or closed (but not when an open incident recorded a re-occurence), a condition on the eventClass binding can be employed, specifying that this binding must not match IncidentRecordedEvent. This is what was done in the second rule shown above. Note that this is functionally equivalent to creating two separate rules – one for IncidentOpenedEvent, one for IncidentClosedEvent, with the same action.

Actions

Once a rule has been evaluated and all its conditions were found to apply to the respective event, the final stage in the rule processing is the execution of the defined actions. A rule can contain an arbitrary number of actions, all of which will be performed in the specified order.

Initially, two kinds of actions are supported, however other actions are expected to be available in the future.

Open/close incidents automatically

This action will automatically manage incidents, opening or closing them as needed, based on the outcome of the incoming event. While it is technically possible to associate this action to any incoming event, it can only properly derive the required information from “execution ended” events (AbstractExecutionEndedEvent or its subclasses in the event hierarchy), and therefore will not have an effect when applied to other events.

Compound Key

This action has an optional parameter named Compound Key. It is recommended to leave this empty in normal circumstances, as it influences the way that incidents are “grouped” by the system, and the default implementation should be suitable for most use cases. The default behavior is as follows:

“Regular” execution ended events will use the executionDescription binding as the key, thus managing incidents by plan (name).
Scheduled execution ended events will use the scheduleId binding.

In some case, you may want to deviate from this default grouping. For example, if you have multiple environments (TEST and PROD), which can be identified via the executionParameters key env, the default grouping would create an incident whenever a plan fails – regardless of the environment. Consider the case where a single plan consistently fails in one environment, but consistently succeeeds in another. In this case, incidents would constantly be opened and closed.

One solution would be to add a condition on the environment to the rules, so only specific environments are even considered for auto-managing incidents. Another option, which this parameter allows, is to specify a compound key for the incident grouping, which for this example would be executionDescription, executionParameters[env] (see below for the syntax). This has the effect of using both bindings as a compound key for identifying incidents, and will effectively manage incidents separately by plan and environment.

Send notification via gateway

This action allows to send a notification via a gateway defined using the Notifications mechanism. Please note that this is an interim solution until a more flexible mechanism will be introduced (expected for Step version 25).

You will first need to define a suitable gateway (either Mail or Custom Webhook) in the system settings. For mail gateways (only), the definition of this action requires the list of mail recipients to be specified. Also note that legacy Step Webhook gateways are not supported here.

Binding evaluation

As mentioned, all events contain one or more bindings providing more detailed information about the event. These bindings can be used to define more specific conditions for rules, and they can be used in rule actions.

In rule conditions

In rule conditions where bindings are referenced, it is as simple as directly using the binding name, verbatim. The only more complicated case is when accessing the content of a map (e.g. the executionParameters binding), in which case the key is directly suffixed in brackets, with no further formatting or escaping – e.g. executionParameters[env] or executionParameters[cluster] or similar.

In rule actions

For the compound key definition of incident actions, the same syntax as for the conditions applies.

However, for notifications, where the data (e.g. mail content, or webhook payload) is generally user-defined, a simple and familiar syntax for string interpolation (also widely employed by other integration solutions) is used:

The string ${someBinding} will be replaced by the (serialized) content of the binding named someBinding.
For map values, ${mapBinding[someKey]} will be replaced by the value of the key someKey in binding mapBinding.

These substitutions are performed as simple textual substitutions, in other words, no full expression evaluation is performed. Rather, if the binding someBinding is not present, occurrences of the string ${someBinding} will simply remain unchanged and not be replaced by anything.

This syntax should be suitable for most output targeted for humans, like mail notifications. For integration with webhooks, it may be more suitable to directly use a machine-readable format like JSON. For this purpose, the following replacements are performed in addition:

The string %{someBinding} will be replaced by the JSON representation of (serialized) content of the binding named someBinding.
For map values, %{mapBinding[someKey]} will be replaced by the JSON representation of the value of the key someKey in binding mapBinding.

Finally, the two special values ${bindings} and %{bindings} will produce a (more or less) human-readable, and a JSON-formatted, machine-readable, representation of all bindings present.

For example, using a mail gateway with the template Hello, here are all the bindings: %{bindings} and sending a notification in reaction to an incident event gives the following result (only formatted for readability):

Hello, here are all the bindings: {
  "eventClass": "IncidentClosedEvent",
  "eventClasses": [
    "IncidentClosedEvent",
    "IncidentEvent",
    "AlertingEvent"
  ],
  "controllerUrl": "http://localhost:4201",
  "incidentTitle": "Assertion for schedule 'Sleep every minute' failed",
  "incidentId": "655dd559d60ac6051820d11b",
  "incidentUrl": "http://localhost:4201/#/root/incidents/655dd559d60ac6051820d11b?tenant=Common",
  "incidentStatus": "CLOSED",
  "incidentCauseEventClasses": [
    "ScheduledExecutionEndedEvent",
    "AbstractExecutionEndedEvent",
    "ExecutionEvent",
    "AlertingEvent"
  ],
  "incidentCauseEventClass": "ScheduledExecutionEndedEvent",
  "projectId": "654b0e3329deb95b5c828e3c",
  "projectName": "Common",
  "executionId": "655dd594d60ac6051820d122",
  "executionDescription": "Sleep",
  "executionUrl": "http://localhost:4201/#/root/executions/655dd594d60ac6051820d122?tenant=Common",
  "executionUserName": "admin",
  "executionParameters": {
    "env": "TEST"
  },
  "executionStatus": "PASSED",
  "errorSummary": "",
  "errorCodes": [],
  "assertionPlanExecutionStatus": "PASSED",
  "assertionPlanErrorSummary": "",
  "assertionPlanErrorCodes": [],
  "assertionPlanExecutionUrl": "http://localhost:4201/#/root/executions/655dd594d60ac6051820d19e?tenant=Common",
  "scheduleId": "654e4a880a142b3f2af5e052",
  "scheduleName": "Sleep every minute",
  "scheduleStatus": "PASSED",
  "eventSummary": "Incident closed: Assertion for schedule 'Sleep every minute' failed",
  "scheduleSucceeded": "true"
}

Alerting rules

Events

Events class hierarchy and bindings

Notes

Rules

Conditions

Binding conditions

Actions

Open/close incidents automatically

Compound Key

Send notification via gateway

Binding evaluation

In rule conditions

In rule actions

See Also