Troubleshooting
General troubleshooting guidelines
Controller
Startup issues
If you encounter any startup issues, please make sure to check the following :
- An instance is not already running (address already bound to port)
- Nothing is running on ports you defined in the step.properties configuration file (default port 8080 for the http application and 8081 for the grid endpoint)
- A MongoDB instance is running on the server and port you defined in the step.properties configuration file (default port 27017 and host is localhost)
- You have sufficient memory to start the JVM
- You’re starting the controller from its bin/ folder
- You’ve checked your logs for other typical errors (log/ folder)
- You haven’t edited the classpath or otherwise made any mistake while editing the start script
- Java is installed on the system or that you’ve set the JDK variable inside the start script
Agent
Startup issues
If you encounter any startup issues, make sure to check the following :
- You have sufficient memory to start the agent
- You are starting the agent from its bin/ folder
- You haven’t edited the classpath or otherwise made any mistake while editing the start script.
- You’ve checked your logs for other typical errors (log/ folder)
- You haven’t edited the classpath or otherwise made any mistake while editing the start script
- Java is installed on the system or that you’ve set the JDK variable inside the start script
Grid Port opening
It may be possible that the agent starts but does not connect to the controller because of the following reasons :
- On the controller host, the CONTROLLER_GRID_PORT does not accept incoming connection
- On the agent host, the CONTROLLER_GRID_PORT port does not accept outgoing connection
In any case, the agent log will display the following error message :
2018-05-25 15:10:52,362 ERROR [Timer-0] s.g.a.RegistrationClient [RegistrationClient.java:74] while registering tokens to http://CONTROLLER_HOST:CONTROLLER_GRID_PORT
javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: connect timed out
- make sure that no other process already use the CONTROLLER_GRID_PORT on the machines
open the machines necessary ports. For this, you can open a command prompt and execute the following command :
On the controller host:
-
Windows :
netsh advfirewall firewall add rule name="CONTROLLER_GRID_PORT" dir=in action=allow protocol=TCP localport=CONTROLLER_GRID_PORT
-
Linux:
On the agent host:iptables -A INPUT -p tcp --dport CONTROLLER_GRID_PORT -j ACCEPT
-
Windows :
netsh advfirewall firewall add rule name="CONTROLLER_GRID_PORT" dir=out action=allow protocol=TCP localport=CONTROLLER_GRID_PORT
-
Linux :
If the connection has been successfully established, you should see the agent in the Controller “GRID” tab :iptables -A OUTPUT -p tcp --sport CONTROLLER_GRID_PORT -j ACCEPT
Agent Port opening
Once a connection is successfully initiated between Controller and Agent using the CONTROLLER_GRID_PORT, a new one is established between them in order to be used for the futures tests execution. This second connection is by default established on a random port and could create the same connectivity issue mentioned in the Grid Port opening section.
If the controller is then not able to communicate with the agent on the AGENT_PORT, following error message will occur in the Controller logs during a test execution :
2018-05-25 16:06:37,312 WARN [QuartzScheduler_Worker-3] s.g.c.GridClient [GridClient.java:141] Error while reserving session for token 07f5e5d8-4d37-40f0-a47f-68578c9624f4. Returning token to pool. Subsequent call to this token may fail or leaks may appear on the agent side.
step.grid.client.GridClient$AgentCommunicationException: Error while calling agent AgentRef [agentId=bc6bbe07-a28e-4d52-8ebe-54e6781db834, agentUrl=http://AGENT_HOST:AGENT_PORT] to execute /reserve on token Token [id=07f5e5d8-4d37-40f0-a47f-68578c9624f4]: java.net.SocketTimeoutException: connect timed out
-
choose a fixed port for the AGENT_PORT. You can do this by adding the “agentPort” property in the agent configuration file AgentConf.json:
{ "gridHost":"http://CONTROLLER_HOST:CONTROLLER_PORT", "agentPort":AGENT_PORT, "registrationPeriod":1000, .... }
-
open the machines necessary ports. For this, you can open a command prompt and execute the following command
On the agent host:
-
Windows :
netsh advfirewall firewall add rule name="AGENT_PORT" dir=in action=allow protocol=TCP localport=AGENT_PORT
-
Linux :
On the controller host:iptables -A INPUT -p tcp --dport AGENT_PORT -j ACCEPT
-
Windows :
netsh advfirewall firewall add rule name="AGENT_PORT" dir=out action=allow protocol=TCP localport=AGENT_PORT
-
Linux :
iptables -A OUTPUT -p tcp --sport AGENT_PORT -j ACCEPT
-
Specific error messages
Timeout while processing request
The following message suggests that a keyword execution lasted longer than the authorized duration and as a result, a timeout occurred on the controller’s side:
By default, the maximum execution time is set to 180 seconds upon Keyword creation / configuration. If you expect your Keyword execution to be longer than 180 seconds, you may want to adjust this value by opening the configuration pane of your Keyword and modifying the “Call timeout” parameter:
However, if you’re expecting for your keyword to always finish within this time limit, you’re most likely running into an inner timeout within your script. This has to be addressed by the person responsible for the code of the Keyword (i.e the developer). The best way to troubleshoot such an issue is to provide the developer with the inputs and execution context of the keyword so as to be able to reproduce it in their development environment (via J/N-Unit tests).
If it is not immediately clear to the developer what the root cause of the extended duration is, further investigation could be done directly on the agent side after adding traces and redeploying the Keyword. Such traces could take the following forms:
- Screenshots (in the case of Selenium) or additional output information and attachments can be taken and added at the end or beginning of each step and interpreted at the end of the keyword’s execution. Make sure to increase the timeout value first, otherwise the timeout will prevent for the information to be reported due to the interruption on the controller’s side.
- Step’s Measurement API can be used to investigate the duration of every section of code within the keyword and pinpoint the exact test step causing the delay
Using an additional “debug” parameter within the keyword’s logic may be a good option in order to switch these traces on and off. Also, installing a dedicated debug agent, to which debug executions could be specifically routed, is a good idea. That way, the developer could follow the execution of a keyword in real time (provided there are some non-headless events to watch and follow) and as part of a complete test scenario.
Not able to find any agent token matching selection criteria
If more keywords are executing concurrently than the number of agent tokens available at that time, you will most likely run into the following error message:
Error: Not able to find any agent token matching selection criteria $agenttype=default and #THREADID#=^30$ (optional) and accepting attributes {} . Check the attachments for more details.
- install new agents or increase the token capacity of one or more of the existing agents
- reduce the number of active threads originating from your tests
- reduce the number of concurrent tests altogether
Error while calling agent AgentRef to execute /release on token
The following message can occur when making use of Session objects:
step.grid.client.GridClient$AgentCommunicationException: Error while calling agent AgentRef [...] to execute /release on token Token [...]: java.net.SocketTimeoutException: Read timed out
- The agent host is overloaded and is unable to complete the release process in time
- The session cleanup phase takes unusually long or stalls entirely (ie. calling the close() method on every object stored in the session)
- The agent is suddenly unreachable at the time of release (general Socket timeout issue)
After making sure that the agent is not overloaded (cpu, memory, i/o), you will want to investigate the close() method of the objects you’ve stored in session (for instance, the wrapper of your Selenium driver).