Cross-cluster agent connectivity (Grid Proxy)
The Grid Proxy connects secondary Kubernetes clusters or namespaces that host agents (but not the controller or other Step services) to the primary Step grid. It forwards agent registration and execution traffic, and, when enabled, also manages dynamic agent provisioning.
What is the Grid Proxy?
The Grid Proxy is a Step component deployed in each secondary cluster (or namespace). It has two primary roles:
1. Agent connectivity
- The Grid Proxy forwards registration requests from the agents in the secondary cluster to the agent grid of the primary cluster (running in the controller).
- During Plan execution, execution requests from the primary controller are forwarded through the Grid Proxy to the agents in the secondary cluster.
2. Agent provisioning (optional)
When agent provisioning is enabled in the Grid Proxy:
- Synchronizes agent pool templates from the secondary cluster to the registry running in the controller of the primary cluster.
- Executes local agent provisioning within the secondary cluster (create/update/scale/delete agent pools).
The primary cluster does not directly connect to or provision agents in secondary clusters; it delegates via the Grid Proxy.
When to use it
- You operate more than one Kubernetes cluster (regions, AZs, on-prem + cloud) or you want to connect isolated namespaces in the same cluster.
- You want central plan orchestration in a single instance of Step (the primary cluster/controller) while:
- Connecting agents already running permanently in the secondary cluster or namespace, or
- Provisioning agents dynamically in secondary clusters.
Note: The primary cluster/controller must be available for the solution to operate. The Grid Proxy does not provide continued orchestration during a primary outage.
Architecture & data flow
- Agent registration: Permanent or provisioned agents register via the Grid Proxy; the Grid Proxy forwards to the primary agent grid.
- Plan orchestration: Plans are orchestrated centrally in the primary controller. When provisioning is enabled, Plans can explicitly define a target agent pool template using a global pool name (
<cluster>.<template>). Otherwise, selection depends on Keyword routing. - Provisioning (if enabled): Grid Proxy provisions/scales the agent pool in its own cluster.
- Execution: Controller execution requests flow through the Grid Proxy to local agents.
Key responsibilities
- Connectivity bridge: Forward agent registrations and execution requests between secondary agents and the primary agent grid.
- Catalog sync (when provisioning enabled): Publish/refresh local pool templates to the primary registry.
- Provisioning executor (when enabled): Apply desired state (replicas, labels, images, resources) in its own cluster.
- Health reporting: Expose readiness endpoint and propagate agent/pool status.
Deployment topologies
- One Grid Proxy per secondary cluster or namespace (supported)
- Primary cluster: no Grid Proxy needed (controller, agent grid, and registry live here).
- Namespace isolation: It is possible to connect namespaces within the same cluster using the Grid Proxy.
Prerequisites
- The services of the agent grid in the primary cluster must be exposed externally to be reachable by secondary clusters/namespaces:
grid:
expose: true
- Grid authentication (optional): If enabled, must be configured consistently across all namespaces (external secret reference recommended for cross-namespace deployments). See Security/Authentication for details:
grid:
authentication:
enabled: true # Set consistently across all clusters / namespaces
gridSecretRef:
name: grid-auth-secret
key: jwt-key
- For provisioning mode: RBAC as described in Agent Provisioning configuration.
Configuration reference
Example Helm values (simplified):
gridproxy:
enabled: true # enable Grid Proxy
agentProvisioning:
enabled: true # enable provisioning (optional)
config:
gridProxyName: MyProxy1
gridUrl: https://step-primary.example.com # URL of the primary grid
port: 8081 # port of the grid proxy endpoint
Important settings
gridproxy.enabled— deploy the Grid Proxy.agentProvisioning.enabled— enable provisioning features.config.gridProxyName— unique name of the grid proxy. This name is also used as thefor the global naming of agent pools (See Identification of agent pools) config.gridUrl— URL of the agent grid of the primary grid.config.port— proxy port. // TODO: is this enough? how is the host determined in K8s?
Networking & ports
Listener
- Grid Proxy listens on the port configured in
gridproxy.config.port. - Services running on Grid Proxy = agent services + provisioning services (if enabled).
Connectivity directions (required)
- Secondary → Primary: for agent registration and (if enabled) agent pool template sync.
- Primary → Secondary (→ Grid Proxy): for execution of Keywords (automation scripts) on agents, forwarded through the proxy.
Exposure
- Primary cluster: expose the agent grid externally so secondaries can reach it:
grid:
expose: true
- Secondary cluster/namespace: expose the Grid Proxy endpoint (
config.port) so the primary can reach it (via Service/Ingress that resolves from the primary’s network). Ensure thegridProxyUrl/gridproxy.config.portcorresponds to a routable address from the primary cluster.
Security
Authentication
Grid authentication is optional but provides secure communication between grid components. When enabled, it uses a shared JWT secret key that must be configured consistently across all namespaces and clusters.
Important: Either enable authentication everywhere or disable it everywhere. Mixed configurations (some components with authentication, others without) will not work.
Configuration Options
There are three ways to configure the JWT secret:
- External reference: Reads from an existing Kubernetes secret you manage (works across namespaces when secret is replicated)
- Manual value: Uses the provided secret directly from values configuration (not recommended for production)
- Auto-generated: Creates a Kubernetes secret named
grid-secret-keythat persists across deployments (⚠️ single namespace only - cannot be shared across namespaces)
1. External Kubernetes secret reference (recommended)
grid:
authentication:
enabled: true
gridSecretRef:
name: my-grid-secret
key: jwt-key
2. Manual secret value
grid:
authentication:
enabled: true
gridSecretKey: "your-secret-key-here"
3. Auto-generated secret (single namespace only)
grid:
authentication:
enabled: true
# gridSecretKey and gridSecretRef left empty - auto-generates 16-character secret
4. Disabled authentication
grid:
authentication:
enabled: false # Must be set consistently across all components
Cross-Namespace Setup (Controller + Grid Proxy)
Option A: With Authentication Enabled When the controller and grid proxy are deployed in different clusters / namespaces, use the external secret reference approach:
- Create the JWT secret in both namespaces:
# Generate a random secret
SECRET_VALUE=$(openssl rand -base64 32)
# Create secret in primary namespace
kubectl create secret generic grid-auth-secret \
--from-literal=jwt-key="$SECRET_VALUE" \
--namespace=step-controller
# Create secret in secondary namespace
kubectl create secret generic grid-auth-secret \
--from-literal=jwt-key="$SECRET_VALUE" \
--namespace=step-gridproxy
- Configure both deployments to reference the same secret:
grid:
authentication:
enabled: true
gridSecretRef:
name: grid-auth-secret
key: jwt-key
Option B: With Authentication Disabled Simply ensure authentication is disabled consistently in both namespaces:
grid:
authentication:
enabled: false
Components Affected
- Controller (Enterprise Edition only): Configures
grid.security.jwtSecretKeyin step.properties - Grid Proxy: Configures
gridSecurity.jwtSecretKeyin GridProxyConf.yaml - Agents: Configures
gridSecurity.jwtSecretKeyin AgentConf.yaml
RBAC
- Connectivity-only mode: No additional RBAC needed beyond basic grid proxy deployment
- Provisioning mode: Requires RBAC as documented in Agent Provisioning Configuration
Observability
- Health endpoint:
GET <gridProxyUrl>/ready - Logs: The logs of the Grid Proxy pod contain information related to the registration of agents and the provisioning
Scaling & limits
- Stateless; only one Grid Proxy per namespace is supported.
- Operational limits: Same as for the primary cluster.
Failure modes & recovery
- Primary unreachable from secondary: registrations/template sync fail; existing agents may continue running but are not globally visible; the sync is re-established once the connection recovers.
- Secondary (Grid Proxy) unreachable from primary: Plan orchestration runs, but Keyword execution to secondary agents fails;
- Auth errors: Shared JWT expired/invalid; // TODO check error
- Name collisions: Global pool name conflict → event + error. // TODO check error
Operations
Install (Helm)
helm repo add exense-charts https://nexus-enterprise.exense.ch/repository/exense-charts/
helm upgrade --install grid-proxy step/grid-proxy \
--namespace step-system \
--values values.yaml exense-charts/step --version 1.3.0
Upgrade / rollback
- Use
helm upgradewith--atomic. - Validate
/readybefore resuming traffic.
Secret rotation
- Update the shared JWT secret in Kubernetes.
- Restart Grid Proxy deployment:
kubectl rollout restart deploy/grid-proxy.
Troubleshooting
Authentication Issues
- “JWT authentication failed” or connectivity issues
- Check consistency: Verify
grid.authentication.enabledis set to the same value (true/false) across all clusters / namespaces - Mixed configuration: Ensure you’re not mixing authenticated and non-authenticated components
- If authentication is enabled, ensure the
grid.authenticationsettings are consistent across all clusters / namespaces- If using external secret reference, ensure the referenced secret exists and contains the correct key in all relevant clusters / namespaces
- If using manual secret value, ensure the same values has been used during deployment in all clusters / namespaces
- Check consistency: Verify
- Auto-generated secret not working
- Check if the
grid-secret-keyKubernetes secret was created - Verify the secret contains a valid
gridSecretKeydata field - Ensure both controller and grid proxy pods have restarted after secret creation
- Check if the
Connectivity Issues
- Agents not appearing in primary grid
- Check
/readyendpoint on grid proxy - Verify shared JWT secret matches between primary and secondary (see Authentication Issues above)
- Confirm
config.gridUrlis reachable from secondary cluster - Check grid proxy logs for authentication or connection errors
- Check
- Execution requests failing
- Verify bidirectional connectivity (primary can reach grid proxy endpoint)
- Check authentication is working (see Authentication Issues above)
- Inspect grid proxy logs for forwarding errors
Provisioning Issues
- Provisioning stuck (if enabled)
- Confirm RBAC matches Agent Provisioning Configuration
- Inspect Kubernetes events for rate limits or denied API calls
- Check authentication is working for template synchronization
FAQ
Is Grid Proxy required in the primary cluster? No.
Can I run the Grid Proxy without provisioning? Yes. In that case, agent have to run permanently in the secondary cluster. The Grid Proxy will only forward agent registration and execution requests.
Do I need connectivity from primary ↔ secondary? Yes, both directions are required.
- Secondary → Primary: agent registration and (if enabled) agent pool template sync.
- Primary → Secondary (to Grid Proxy): dispatch and forwarding of Keywords to agents.