Troubleshoot a self-hosted step runner

Updated on 31 Jan 2023
2 Minutes to read

Share
Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

When a step that is run by a self-hosted step runner fails you should verify the connectivity between the runner and the Torq cloud as well as the internal connectivity between the runner and the end user app. Follow the suggestions below to get more information so you can understand what the problem is or have more information when contacting support.

Analyze the error message

Check the error in the step Execution Log.

The error might indicate a problem with the connectivity between the runner and the Torq cloud.

steprunner1

The error might indicate a different connectivity issue or a timeout. In these cases, we recommend you check the internal connectivity and ensure there's no firewall blocking the runner's access to the end user app.

steprunner2

Torq UI indications

Go to Integrations > Runner.
Select Step Runner.
Find the relevant runner in the list.
Check the runner status. Unhealthy (red) means the runner isn't connected to the Torq cloud.
Check the Last Seen column to understand when was the connection lost.

steprunner3

Get more information

To make sure the runner host has sufficient resources for the step execution you can check for CPU/memory usage spikes. This process differs according to whether the runner is docker-based or Kubernetes-based.

Check your cloud provider health monitoring metrics for the runner host/cluster performance history.
Check the local runner host/cluster for resource-consumption-related errors, for example, with the dmesg/syslog commands. You can attach the list of errors if you end up contacting Torq support.
Run the command docker ps (Kubernetes: kubectl --namespace torq get pods) to see whether the runner is currently up and running.
Run the command docker stats (Kubernetes: kubectl --namespace torq top) to see the memory and CPU usage of the runner host.
Run the command docker logs <CONTAINER ID> >& myFile.log (Kubernetes: kubectl --namespace torq logs <pod id> >& myFile.log) to get additional information about the runner activity. The <CONTAINER ID>(Kubernetes: <pod id>) is available in the output of the previous commands. The output of the docker logs (Kubernetes: kubectl --namespace torq logs) command is added to a file that you can send to the support representative.

Conclusions

Once you've checked the Torq UI indications and collected the additional information as described in the previous section you can decide what to do next.

If the runner host CPU or memory usages are high you should allow more resources to the host/cluster that is running the runner and rerun the failing step.
If the runner is healthy (green), there are no connectivity issues with the runner itself. In this case, the problem may be in the internal connectivity with the service the runner is trying to communicate with (not related to Torq). We recommend you check the internal connectivity.
If the runner is not operational, we recommend you regenerate the runner install command and run it to deploy a new service. Contact your support representative with the extracted logs to understand the problem with the original malfunctioning service.
If there are no connectivity or resource limitation issues, contact your support representative with the extracted logs.

Was this article helpful?

What's Next

Custom secrets

Table of contents

Analyze the error message
Torq UI indications 
Get more information
Conclusions