Create a step runner
  • 20 Feb 2024
  • 11 Minutes to read
  • Dark
    Light

Create a step runner

  • Dark
    Light

Article Summary

Use a step runner to execute steps requiring access to components hosted in private VPCs or data centers.

Configure a self-hosted step runner

  1. Go to Integrations >  Runner. Select Step Runner and click  Add. Gif of Torq's Integration page, clicking through to the step runner

  2. Enter a meaningful name for the step runner. The name should represent the type (Kubernetes or Docker) and the environment where the runner will be deployed.

  3. Select either Kubernetes or Docker.

  4. Click Add.

  5. Copy the runner install command. Torq automatically generates a deployment configuration file in YAML format, which is downloaded and executed automatically when running the install command.

    Note

    The runner install command, new or regenerated, is valid for 24 hours.

    copy the runner deployment command
  6. Deploy the self-hosted step runner.

Regenerate a runner install command

Regenerate the install command for any runner to reinstall it (if unhealthy) or create an additional service that can execute the runner jobs.

  1. Go to Integrations > Runner. Select Step Runner.

  2. Locate the runner you want to regenerate the install command for.

  3. Go to the runner three dots menu and select Regenerate install command.

    Regenerate the runner install command
  4. Select the platform in which you want to deploy the runner: Docker/Kubernetes.

  5. Copy the new install command and run it. The command is valid for 24 hours.

Deploy a self-hosted step runner

After you finish defining a new remote step runner, Torq automatically generates a deployment configuration file in YAML format (the file is downloaded automatically) paired with deployment instructions.

Deploy using Docker

To deploy a runner using Docker, run the install command in a terminal. When done, the runner should be ready to use.

Advanced options

To customize the runner deployment instructions (which shouldn't be required for most use cases), you can add flags to the automatically generated deployment configuration file or change the default values.

  1. Retrieve the content of the automatically generated file. For example, for the following install command curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8" | sh, run:   curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8"

  2. Copy the command output and paste it into a fresh line. 

  3. Add flags or change values according to your needs. These are two examples:

    • Specify a proxy server: -e https_proxy=http://XXXXXXX:PORT. Make sure you replace XXXXXXX:PORT with your actual proxy address and port.

    • Connect to a bridge network: -e DOCKER_NETWORK_ID='XXXXXXXXXXX' -e DOCKER_NETWORK_NAME='XXXXXXXXXX'. A bridge network uses a software bridge that allows containers connected to the same bridge network to communicate while providing isolation from containers that are not connected to that bridge network.

  4. Run the edited deployment configuration file.

  5. Run docker ps to confirm that the service is running.

  6. To retrieve the edited deployment configuration file at any time, use the command: docker inspect --format "$(curl -Ls https://link.torq.io/7Yoh)" $(docker ps | grep spd | awk '{print $1}') The retrieved configuration file can be used to deploy the runner on a new machine.

Deploy using Kubernetes 

Allow Kubernetes cluster management actions 

If you select the Allow Kubernetes cluster management actions option, workflow steps executed using this runner can take actions on the Kubernetes cluster itself. The auto-generated Kubernetes resources will include a ClusterRole resource allowing Get and List access to resources, such as Pods, Pod Logs, Events, Nodes, Configmaps, Deployments, and Horizontal Pod Autoscalers. Another automatically generated ClusterRoleBinding object will automatically bind this role to all steps executed by the runner. 

Important

If you don't select this option, the steps will not be able to perform any actions on the Kubernetes cluster they are running on. This may not prevent them from accessing external services. If the operations listed above are not sufficient, and workflow steps are expected to perform additional operations on the cluster, the ClusterRole can be modified and re-applied to the cluster. Alternatively, you can create a dedicated Kubernetes integration to manage specific permissions for specific steps.

Below is a default configuration for the ClusterRole and ClusterRoleBinding objects that are automatically added to the speed runner deployment when the Allow Kubernetes cluster management actions option is selected:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: torq-step
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - pods/log
  - events
  - nodes
  - configmaps
  verbs:
  - get
  - list
- apiGroups:
  - apps/v1
  resources:
  - deployments
  verbs:
  - get
  - list
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  - horizontalpodautoscalers/status
  verbs:
  - '*'
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: torq-step
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: torq-step
subjects:
- kind: ServiceAccount
  name: default
  namespace: torq

Use an existing Kubernetes clusters

The configuration YAML file contains resource definitions for all the resources required to run step runner pods and execute steps. To apply the configuration to the cluster, run kubectl apply -f <file path>

All Torq resources are created inside dedicated Kubernetes namespaces torq for the Torq agent itself and for jobs of various steps), allowing for simple monitoring and removal by issuing a kubectl delete namespace <namespace-name> command.

Use AWS Elastic Kubernetes Service (EKS)

For Amazon Web Services users that do not use Kubernetes Cluster, the easiest way to deploy a self-hosted step runner is to establish a managed Kubernetes Cluster using AWS Elastic Kubernetes Service. Creation and management of the cluster can be achieved using a number of simple steps defined below. Hosting a Torq step runner on the cluster is a straightforward step.

1. Install AWS CLI, eksctl, kubectl 

To simplify creating an EKS Cluster, it is recommended to use the AWS ClI and eksctl command-line tools provided by Amazon Web Services. To control the Kubernetes cluster, you should install and configure a kubectl command-line tool.

This step-by-step guide from Amazon Web Services provides all the information necessary to download and configure the tools.

In order to configure the AWS command-line utility, a set of AWS Credentials is required. This guide from AWS explains how to configure the command-line utility.

Generally, executing AWS configuration and providing the authentication credentials will prepare the system for operation: 

aws configure
AWS Access Key ID [********************]:
AWS Secret Access Key [********************]:
Default region name [us-east-2]:
Default output format [None]:

2. Create an EKS cluster

Eksctl utility is the simplest way to create an EKS cluster. The utility can be used in two modes: 

The  imperative mode receives all the information about the cluster in the command-line arguments, similar to the following example. 

eksctl create cluster \
  --name torq-steps \
  --version 1.17 \
  --region us-east-2 \
  --nodegroup-name linux-nodes \
  --node-type t3.small \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 2 \
  --ssh-access \
  --ssh-public-key my-public-key.pub \
  --managed

The infrastructure-as-code mode uses all cluster definitions defined in a YAML file that is provided to the utility execution. Below is a sample YAML configuration file for the eksctl utility:

apiVersion: eksctl.io/v1alpha5 
kind: ClusterConfig

metadata: 
  name: torq-steps 
  region: us-east-2

nodeGroups: 
  - name: torq-steps-ng-1 
    instanceType: t3.small 
    desiredCapacity: 2

To apply this configuration, call: eksctl create cluster -f <path_to_configuration_yaml>

Note

Torq requires a minimum of 2 vCPUs and 2GB RAM for the nodes in a Kubernetes cluster to ensure the ability to execute steps during a workflow run.

Important

One of the benefits of using eksctl utility is that it simplifies the later usage of the Kubernetes cluster by creating a kubeconfig file (later to be used by the kubectl utility). EKS Cluster(s) can be created using other means, such as but not limited to, AWS Console or a dedicated AWS EKS Terraform Module, but then, the kubeconfig creation will become the responsibility of the user.

Running eksctl will update the kubeconfig file (by default a file named config and located under $HOME/.kube with the details of the newly created cluster and will automatically change the context to it. 

3. Deploy the step runner

After the EKS cluster is created, you deploy the step runner using exactly the same procedure as in any other Kubernetes Cluster.


Use K3S on any server

The fastest and easiest way to deploy a production-grade Kubernetes cluster (for small tasks) on any virtual (or physical) machine is by using a K3S Kubernetes distribution. K3S, originally developed by Rancher is a CNCF Sandbox Project that can be deployed on any x85_64, ARMv7 or ARM64 server within ~30 seconds (according to the developer website).

Step-by-step deployment instructions are available on the K3S Site. After deploying K3S cluster, you can deploy the step runner using the exact same procedure as in any other Kubernetes Cluster.

Audit and troubleshoot self-hosted runners

While a step runner is a simple component that doesn't require any special configuration or treatment, being able to audit its operations and, when required, troubleshoot its activity can help resolve challenging situations. The below commands suggest how to get an insight into an activity performed by a step runner.

Step execution events

All steps are executed in the Kubernetes namespace called "torq" (Kubernetes is assumed as a default step runner adapter).

Using kubectl to get the list of events that took place in the namespace using kubectl get events --namespace=torq

The output should consist of the following types of events:

  • Pulled: Container image for a specific step was pulled from the container registry

  • Created: Container, based on the pulled image, was created in preparation to execute the step

  • Started: Step container execution started

Additional events can indicate longer processes, such as Pulling, Scheduled, and others.

Find step execution jobs

When workflows are running, steps are initiated by the step runner as jobs in the torq Kubernetes namespace. To view the currently-running jobs, use kubectl get jobs --namespace=torq

Pull step runner logs

The step runner is a Kubernetes Pod running in the "torq" namespace. In order to retrieve its logs, first one should find the Pod name by issuing kubectl get pods  --namespace=torq

Then, using the Pod name retrieved, to see the detailed logs using kubectl logs <POD_NAME>  --namespace=torq

URLs the runner uses to communicate with Torq

Step runners use the following URLs to communicate with Torq.

URL

Purpose

https://us-docker.pkg.dev/

Used to pull the runner image

https://link.torq.io/ or https://link.eu.torq.io/

Used to pull configurations (one time)

https://pubsub.googleapis.com/     

Used to communicate with the Torq service

https://storage.googleapis.com/    

Used to upload logs

https://auth.googleapis.com/

Used for authentication

For a full list of IP addresses that Google uses, see the following links:

Use an external secrets manager

Torq provides the Custom Secrets integration, which is a secure way of managing secrets, such as credentials, keys, etc., that you can use in workflow steps. This is a convenient way to store sensitive data (without it being exposed in the UI) and to be able to reuse it in workflows running across different environments.

In some cases, when executing specific steps inside "closed" environments, you might need to store secrets used by specific steps outside of the Torq environment. Torq steps can implement fetching secrets locally from external secret management solutions, such as (but not limited to) AWS Secrets Manager, Google Cloud Secret Manager, or Azure Key Vault. Integration steps support these solutions where applicable.

The mechanism for using an external secret manager is described below. We will use an example of an ssh-command step to demonstrate the differences between having the SSH Certificate stored in the Custom Integration in Torq, AWS Secrets Manager, and Google Cloud Secret Manager.

In order for the step to retrieve a secret from an external system, the following tasks should be done ahead of time:

  • A secret (in our example - SSH certificate) should be stored in the external system.

  • If AWS Secrets Manager is used, the following aws-cli command can be used:

aws secretsmanager create-secret --name <SECRET_NAME> \
      --description "<SECRET_DESCRIPTION>" \
      --secret-string <FILE_CONTAINING_SECRET_DATA>`

If Google Cloud Secret Manager is used, the following gcloud cli command can be used:

gcloud secrets create <SECRET_NAME> \
  --data-file="<FILE_CONTAINING_SECRET_DATA>" \
  --replication-policy=automatic


  1. Define a Service Account (GCP) or a User with Programmatic Access (AWS) and provide these with the relevant permissions to access secrets. AWS provides a built-in arn:aws:iam::aws:policy/SecretsManagerReadWrite policy, however, it allows excessive permissions. It is recommended to construct a dedicated policy containing Secrets Manager : DescribeSecret and Secrets Manager : GetSecretValue permissions. Similarly, in GCP the service account would need secretmanager.secrets.get and secretmanager.secrets.access permissions.

  2. Store the Service Account credentials in the Custom Secrets integration and define an access policy to the secret objects. By doing so, the service accounts will be (in the following steps) provided the steps in the workflows that need to retrieve the actual secrets from the external secret management system, and the IAM policy can restrict access for specific environments (ensuring that, for example, only the jobs running in specific locations can get access).

  3. In the workflow, make sure that the service account data is being passed to the relevant steps. (This will allow the steps to assume the role and pull the actual secret, providing it is allowed by the IAM policy).

  4. Use (or implement) steps that support retrieving secrets from the relevant secret managers. Below example demonstrates ssh-command step in different flavors, using AWS Secrets Manager or Google Cloud Secret Manager to retrieve the actual SSH Certificate for the connection:

- id: run_ssh_command_on_remote_host
    name: us-docker.pkg.dev/torq/public/ssh-command-aws-secret:rc0.7
    runner: aws-env
    env: 
      SSH_CLIENT: "{{ .UserName }}@{{ .ServerAddress }"
      SSH_COMMAND: |
        uname -a;
        df -k;
        ls -l ~/
      AWS_ACCESS_KEY_ID: '{{ secret "AWS_SECRET_MANAGER_ROLE_KEY" }}'
      AWS_SECRET_ACCESS_KEY: '{{ secret "AWS_SECRET_MANAGER_ROLE_SECRET" }}'
      SSH_KEY_SECRET_NAME: 'MySSHKey'
    output_parser:
      name: us-docker.pkg.dev/torq/public/raw-output-parser

As shown in the example, the step expects to receive credentials for the role that'd allow it to retrieve the actual SSH Certificate, stored in a secret named MySSHKey in the AWS Secrets Manager. Naturally, this is just an example, and the secret can be anything, not just an SSH Certificate.

The credentials stored in AWS_SECRET_MANAGER_ROLE_KEY and AWS_SECRET_MANAGER_ROLE_SECRET are the ones for the user account with programmatic access, as defined in step #2 above.

Google Cloud secret manager

- id: run_ssh_command_on_remote_host 
    name: us-docker.pkg.dev/torq/public/ssh-command-gcloud-secret:rc0.7
    runner: gcp-env
    env: 
      SSH_CLIENT: "{{ .UserName }}@{{ .ServerAddress }"
      SSH_COMMAND: |
        uname -a;
        df -k;
        ls -l ~/
            AUTH_CODE: '{{ secret "GCP_SECRET_MANAGER_ACCOUNT_TOKEN"}}'
      SSH_KEY_SECRET_NAME: 'MySSHKey'
    output_parser:
      name: us-docker.pkg.dev/torq/public/raw-output-parser

In this example, the step expects to receive a base64-encoded version of the authentication token for a service account in AUTH_CODE environment variable. Then, assuming the role of this service account, the actual SSH Certificate will be retrieved from a secret named MySSHKey.


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.