• Documentation
  • Tutorials
  • Blogs
  • Product

What's on this Page

  • What is the Grid Proxy?
    • 1. Agent connectivity
    • 2. Agent provisioning (optional)
  • When to use it
  • Architecture & data flow
  • Key responsibilities
  • Deployment topologies
  • Prerequisites
  • Configuration reference
    • Important settings
  • Networking & ports
  • Security
    • Authentication
    • RBAC
  • Observability
  • Scaling & limits
  • Failure modes & recovery
  • Operations
    • Install (Helm)
    • Upgrade / rollback
    • Secret rotation
  • Troubleshooting
    • Authentication Issues
    • Connectivity Issues
    • Provisioning Issues
  • FAQ
  • Step
  • Set up Step
  • Configuration
  • Cross-cluster agent connectivity (Grid Proxy)
Categories: ADMIN GUIDE
This article references one of our previous releases, click here to go to our latest version instead.

Cross-cluster agent connectivity (Grid Proxy)

The Grid Proxy connects secondary Kubernetes clusters or namespaces that host agents (but not the controller or other Step services) to the primary Step grid. It forwards agent registration and execution traffic, and, when enabled, also manages dynamic agent provisioning.

What is the Grid Proxy?

The Grid Proxy is a Step component deployed in each secondary cluster (or namespace). It has two primary roles:

1. Agent connectivity

  • The Grid Proxy forwards registration requests from the agents in the secondary cluster to the agent grid of the primary cluster (running in the controller).
  • During Plan execution, execution requests from the primary controller are forwarded through the Grid Proxy to the agents in the secondary cluster.

2. Agent provisioning (optional)

When agent provisioning is enabled in the Grid Proxy:

  • Synchronizes agent pool templates from the secondary cluster to the registry running in the controller of the primary cluster.
  • Executes local agent provisioning within the secondary cluster (create/update/scale/delete agent pools).

The primary cluster does not directly connect to or provision agents in secondary clusters; it delegates via the Grid Proxy.

When to use it

  • You operate more than one Kubernetes cluster (regions, AZs, on-prem + cloud) or you want to connect isolated namespaces in the same cluster.
  • You want central plan orchestration in a single instance of Step (the primary cluster/controller) while:
    • Connecting agents already running permanently in the secondary cluster or namespace, or
    • Provisioning agents dynamically in secondary clusters.

Note: The primary cluster/controller must be available for the solution to operate. The Grid Proxy does not provide continued orchestration during a primary outage.

Architecture & data flow

  1. Agent registration: Permanent or provisioned agents register via the Grid Proxy; the Grid Proxy forwards to the primary agent grid.
  2. Plan orchestration: Plans are orchestrated centrally in the primary controller. When provisioning is enabled, Plans can explicitly define a target agent pool template using a global pool name (<cluster>.<template>). Otherwise, selection depends on Keyword routing.
  3. Provisioning (if enabled): Grid Proxy provisions/scales the agent pool in its own cluster.
  4. Execution: Controller execution requests flow through the Grid Proxy to local agents.
agent-provisioning-architecture-multicluster.svg

Key responsibilities

  • Connectivity bridge: Forward agent registrations and execution requests between secondary agents and the primary agent grid.
  • Catalog sync (when provisioning enabled): Publish/refresh local pool templates to the primary registry.
  • Provisioning executor (when enabled): Apply desired state (replicas, labels, images, resources) in its own cluster.
  • Health reporting: Expose readiness endpoint and propagate agent/pool status.

Deployment topologies

  • One Grid Proxy per secondary cluster or namespace (supported)
  • Primary cluster: no Grid Proxy needed (controller, agent grid, and registry live here).
  • Namespace isolation: It is possible to connect namespaces within the same cluster using the Grid Proxy.

Prerequisites

  • The services of the agent grid in the primary cluster must be exposed externally to be reachable by secondary clusters/namespaces:
grid:
  expose: true
  • Grid authentication (optional): If enabled, must be configured consistently across all namespaces (external secret reference recommended for cross-namespace deployments). See Security/Authentication for details:
grid:
  authentication:
    enabled: true  # Set consistently across all clusters / namespaces
    gridSecretRef:
      name: grid-auth-secret
      key: jwt-key
  • For provisioning mode: RBAC as described in Agent Provisioning configuration.

Configuration reference

Example Helm values (simplified):

gridproxy:
  enabled: true                # enable Grid Proxy
  agentProvisioning:
    enabled: true              # enable provisioning (optional)
  config:
    gridProxyName: MyProxy1
    gridUrl: https://step-primary.example.com   # URL of the primary grid
    port: 8081                 # port of the grid proxy endpoint

Important settings

  • gridproxy.enabled — deploy the Grid Proxy.
  • agentProvisioning.enabled — enable provisioning features.
  • config.gridProxyName — unique name of the grid proxy. This name is also used as the for the global naming of agent pools (See Identification of agent pools)
  • config.gridUrl — URL of the agent grid of the primary grid.
  • config.port — proxy port. // TODO: is this enough? how is the host determined in K8s?

Networking & ports

Listener

  • Grid Proxy listens on the port configured in gridproxy.config.port.
  • Services running on Grid Proxy = agent services + provisioning services (if enabled).

Connectivity directions (required)

  • Secondary → Primary: for agent registration and (if enabled) agent pool template sync.
  • Primary → Secondary (→ Grid Proxy): for execution of Keywords (automation scripts) on agents, forwarded through the proxy.

Exposure

  • Primary cluster: expose the agent grid externally so secondaries can reach it:
grid:
  expose: true
  • Secondary cluster/namespace: expose the Grid Proxy endpoint (config.port) so the primary can reach it (via Service/Ingress that resolves from the primary’s network). Ensure the gridProxyUrl/gridproxy.config.port corresponds to a routable address from the primary cluster.

Security

Authentication

Grid authentication is optional but provides secure communication between grid components. When enabled, it uses a shared JWT secret key that must be configured consistently across all namespaces and clusters.

Important: Either enable authentication everywhere or disable it everywhere. Mixed configurations (some components with authentication, others without) will not work.

Configuration Options

There are three ways to configure the JWT secret:

  • External reference: Reads from an existing Kubernetes secret you manage (works across namespaces when secret is replicated)
  • Manual value: Uses the provided secret directly from values configuration (not recommended for production)
  • Auto-generated: Creates a Kubernetes secret named grid-secret-key that persists across deployments (⚠️ single namespace only - cannot be shared across namespaces)

1. External Kubernetes secret reference (recommended)

grid:
  authentication:
    enabled: true
    gridSecretRef:
      name: my-grid-secret
      key: jwt-key

2. Manual secret value

grid:
  authentication:
    enabled: true
    gridSecretKey: "your-secret-key-here"

3. Auto-generated secret (single namespace only)

grid:
  authentication:
    enabled: true
    # gridSecretKey and gridSecretRef left empty - auto-generates 16-character secret

4. Disabled authentication

grid:
  authentication:
    enabled: false  # Must be set consistently across all components

Cross-Namespace Setup (Controller + Grid Proxy)

Option A: With Authentication Enabled When the controller and grid proxy are deployed in different clusters / namespaces, use the external secret reference approach:

  1. Create the JWT secret in both namespaces:
# Generate a random secret
SECRET_VALUE=$(openssl rand -base64 32)

# Create secret in primary namespace
kubectl create secret generic grid-auth-secret \
  --from-literal=jwt-key="$SECRET_VALUE" \
  --namespace=step-controller

# Create secret in secondary namespace  
kubectl create secret generic grid-auth-secret \
  --from-literal=jwt-key="$SECRET_VALUE" \
  --namespace=step-gridproxy
  1. Configure both deployments to reference the same secret:
grid:
  authentication:
    enabled: true
    gridSecretRef:
      name: grid-auth-secret
      key: jwt-key

Option B: With Authentication Disabled Simply ensure authentication is disabled consistently in both namespaces:

grid:
  authentication:
    enabled: false

Components Affected

  • Controller (Enterprise Edition only): Configures grid.security.jwtSecretKey in step.properties
  • Grid Proxy: Configures gridSecurity.jwtSecretKey in GridProxyConf.yaml
  • Agents: Configures gridSecurity.jwtSecretKey in AgentConf.yaml

RBAC

  • Connectivity-only mode: No additional RBAC needed beyond basic grid proxy deployment
  • Provisioning mode: Requires RBAC as documented in Agent Provisioning Configuration

Observability

  • Health endpoint: GET <gridProxyUrl>/ready
  • Logs: The logs of the Grid Proxy pod contain information related to the registration of agents and the provisioning

Scaling & limits

  • Stateless; only one Grid Proxy per namespace is supported.
  • Operational limits: Same as for the primary cluster.

Failure modes & recovery

  • Primary unreachable from secondary: registrations/template sync fail; existing agents may continue running but are not globally visible; the sync is re-established once the connection recovers.
  • Secondary (Grid Proxy) unreachable from primary: Plan orchestration runs, but Keyword execution to secondary agents fails;
  • Auth errors: Shared JWT expired/invalid; // TODO check error
  • Name collisions: Global pool name conflict → event + error. // TODO check error

Operations

Install (Helm)

helm repo add exense-charts https://nexus-enterprise.exense.ch/repository/exense-charts/   
helm upgrade --install grid-proxy step/grid-proxy \
  --namespace step-system \
  --values values.yaml exense-charts/step --version 1.3.0

Upgrade / rollback

  • Use helm upgrade with --atomic.
  • Validate /ready before resuming traffic.

Secret rotation

  • Update the shared JWT secret in Kubernetes.
  • Restart Grid Proxy deployment: kubectl rollout restart deploy/grid-proxy.

Troubleshooting

Authentication Issues

  • “JWT authentication failed” or connectivity issues
    • Check consistency: Verify grid.authentication.enabled is set to the same value (true/false) across all clusters / namespaces
    • Mixed configuration: Ensure you’re not mixing authenticated and non-authenticated components
    • If authentication is enabled, ensure the grid.authentication settings are consistent across all clusters / namespaces
      • If using external secret reference, ensure the referenced secret exists and contains the correct key in all relevant clusters / namespaces
      • If using manual secret value, ensure the same values has been used during deployment in all clusters / namespaces
  • Auto-generated secret not working
    • Check if the grid-secret-key Kubernetes secret was created
    • Verify the secret contains a valid gridSecretKey data field
    • Ensure both controller and grid proxy pods have restarted after secret creation

Connectivity Issues

  • Agents not appearing in primary grid
    • Check /ready endpoint on grid proxy
    • Verify shared JWT secret matches between primary and secondary (see Authentication Issues above)
    • Confirm config.gridUrl is reachable from secondary cluster
    • Check grid proxy logs for authentication or connection errors
  • Execution requests failing
    • Verify bidirectional connectivity (primary can reach grid proxy endpoint)
    • Check authentication is working (see Authentication Issues above)
    • Inspect grid proxy logs for forwarding errors

Provisioning Issues

  • Provisioning stuck (if enabled)
    • Confirm RBAC matches Agent Provisioning Configuration
    • Inspect Kubernetes events for rate limits or denied API calls
    • Check authentication is working for template synchronization

FAQ

Is Grid Proxy required in the primary cluster? No.

Can I run the Grid Proxy without provisioning? Yes. In that case, agent have to run permanently in the secondary cluster. The Grid Proxy will only forward agent registration and execution requests.

Do I need connectivity from primary ↔ secondary? Yes, both directions are required.

  • Secondary → Primary: agent registration and (if enabled) agent pool template sync.
  • Primary → Secondary (to Grid Proxy): dispatch and forwarding of Keywords to agents.

See Also

  • Create a KeyStore in JKS format
  • Encryption Manager
  • Controller installation
  • Agent Runtime Image Naming Convention Migration
  • Agent Sizing Guide
  • Home
  • Whats new?
  • Release Strategy
  • Set up
    • Requirements
    • Download
    • Installation
    • Configuration
      • Important settings
      • Logging
      • Identity management
      • Licensing settings
      • SSL settings
      • Time Series Configuration
      • Customization
      • Encryption Manager
      • Agent configuration
      • Agent provisioning configuration
      • Agent sizing guidelines
      • Cross-cluster agent connectivity (Grid Proxy)
      • Artifact repositories configuration
  • Administration
  • SaaS guide
  • User guide
  • Developer guide
  • DevOps
  • Plugins
  • Libraries
Step Logo
    • Documentation
    • Tutorials
    • Blogs
    • Product
    • Home
    • Whats new?
    • Release Strategy
    • Set up
      • Requirements
      • Download
      • Installation
      • Configuration
        • Important settings
        • Logging
        • Identity management
        • Licensing settings
        • SSL settings
        • Time Series Configuration
        • Customization
        • Encryption Manager
        • Agent configuration
        • Agent provisioning configuration
        • Agent sizing guidelines
        • Cross-cluster agent connectivity (Grid Proxy)
        • Artifact repositories configuration
    • Administration
    • SaaS guide
    • User guide
    • Developer guide
    • DevOps
    • Plugins
    • Libraries