
Joseph Sibony
reading time:
This post is the second of a 2-part series about running Kubernetes build jobs. In part 1 we reviewed the basic building blocks for deploying workloads to Kubernetes – the Docker image/container and the Kubernetes pod. In this post, we utilize the Kubernetes job object which provides better fault-tolerance and scalability. It’s strongly recommended that you read the first part to understand the low-level building blocks on which this post depends.
The following prerequisites are required to run the code, it’s recommended to run the commands yourself as you follow-along the post:
All the commands shown in the post should run from the root of the repository. So, after you forked the code repository you should clone it and open a terminal at the code repository root directory.
To make the code samples easier to run, set the following environment variables in your shell (replace YourGitHub* values with your relevant details):
export GITHUB_TOKEN=YourGitHhubPersonalAccessToken
export GITHUB_USER=YourGitHubUserName
Also, set the following environment variable which will make following code samples shorter and easier to follow:
QUEUE_IMAGE=ghcr.io/orihoch/k8s-ci-processing-jobs-builder-queue

In the previous post, we saw how to run pods on our Kubernetes cluster. While this can work fine for many use cases, it does have some subtle shortcomings. A Kubernetes cluster can be very dynamic, nodes can be stopped for upgrades, or pods could be scheduled on a node without enough RAM which will cause it to die unexpectedly. It is not recommended to use pods directly, the best practice is to use higher-level abstractions which allow Kubernetes to handle this and other unexpected failures.
The recommended object for running CI build jobs or other one-off processing tasks is the Kubernetes job object. The job object schedules pods and manages them to ensure the job runs to completion.
Jobs are defined in Kubernetes yaml files – all the example yaml files are available in the code repository under manifests/ directory. We use shell templates based on envsubst to simplify creation of multiple objects based on the same templates.
Let’s start with a simple example replicating the pods we used in part 1 to a Kubernetes job object:
# manifests/single-pod-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "builder-$TAG-$OS-$ARCH"
spec:
template:
spec:
containers:
- name: builder
image: $BUILDER_IMAGE
args: ["$OS/$ARCH", "$GITHUB_USER", "$TAG"]
env:
- name: TOKEN
value: "$GITHUB_TOKEN"
restartPolicy: Never
Publish a new release named v0.0.3 and deploy a few jobs to your cluster using the following command:
cat manifests/single-pod-job.yaml | OS=linux ARCH=amd64 TAG=v0.0.3 envsubst | kubectl apply -f -
cat manifests/single-pod-job.yaml | OS=linux ARCH=386 TAG=v0.0.3 envsubst | kubectl apply -f -
cat manifests/single-pod-job.yaml | OS=windows ARCH=arm TAG=v0.0.3 envsubst | kubectl apply -f -
Let’s breakdown these commands to see what’s the meaning of each part:
After running these commands you can see the pods the same as we saw in the previous post:
kubectl get pods
But, you will now also see job objects:
kubectl get jobs
The job objects keep track of the pods and make sure each pod runs to completion. In case of failure the job will retry and schedule a new pod up to 6 times (configurable via the backoffLimit attribute). This means that unexpected failures like a node failure or out of RAM will not prevent the job from running and Kubernetes will make sure your job runs.
When the jobs are complete, you should clean-up all the pods to prevent clutter on your cluster by deleting the job objects:
kubectl delete job builder-v0.0.3-linux-amd64 builder-v0.0.3-linux-386 builder-v0.0.3-windows-arm
When you delete the job objects, the created pods will also be deleted.
The build script supports 44 OS architectures and if your cluster has the capacity it would be best to run all of them in parallel. However, all the examples so far require scheduling each job separately. One of the strengths of the Kubernetes job object is the ability to schedule many parallel pods and wait for them to complete processing.
To use that functionality we need a queue to store the items which need to be processed and handle the queue logic – getting items from the queue, handling timeouts/errors, etc. There are many different ways to implement that and it’s worth checking if your company has existing solutions. For this example, I will show a simple queue implementation based on Redis and minimal Python “glue” code.
You can see all the code under builder-queue/ directory, I will highlight parts of the code below:
To deploy this job queue, we first need our queue server, in this case, we will use Redis. The code repository contains a simple yaml containing a Redis deployment and service. You can review this yaml here. The following commands will deploy it and wait for it to be available:
kubectl apply -f manifests/redis.yaml &&\
kubectl wait deployment/redis --for condition=available
To add jobs to the queue and query the queue status we will need to access this Redis server, we can use the kubectl port-forward command to enable this:
kubectl port-forward deployment/redis 6379 &
Now local port 6379 is forwarded to the redis deployment on your Kubernetes cluster.
Deploy a new release named v0.0.4 and run the following command to add all the OS architectures to the queue:
docker run --network host $QUEUE_IMAGE --rq-add all $GITHUB_USER v0.0.4
You can review what this command does here, but it basically just adds items to the Redis queue. One item per OS architecture.
Now everything is ready to start running the actual workloads. We will use the following yaml:
# manifests/multi-pod-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "builder-queue"
spec:
parallelism: 4
template:
spec:
containers:
- name: builder-queue
image: ghcr.io/orihoch/k8s-ci-processing-jobs-builder-queue
args: ["--rq-worker"]
env:
- name: TOKEN
value: "$GITHUB_TOKEN"
- name: RQ_REDIS_HOST
value: "redis"
restartPolicy: OnFailure
The main difference from the simpler job we used earlier is the parallelism attribute. In this case, we set it to 4 – meaning that 4 parallel pods will be started. We use the jobs-builder-queue image with –rq-worker argument which will process jobs from the queue until no items are left. We set the restartPolicy to OnFailure so that if there is an error the pod will be restarted, but when there are no items left in the queue the process will exit with a successful return code and the pod will not restart.
Deploy this job using the following command:
cat manifests/multi-pod-job.yaml | envsubst | kubectl apply -f -
Check the pods as they are being created and wait for them to be Running:
kubectl get pods
You should see 4 pods as we specified 4 in the parallelism attribute. When pods are running, you can check the queue status using the following command:
docker run --network host $QUEUE_IMAGE --rq-info
You should see all the workers in a busy state. The number of items in the queue should decrease.
When all items in the queue have been processed, you should see all the pods with status Completed.
You can now stop the port-forward to the Redis deployment by running the following:
kill %1
Now you may clean-up the created pods by deleting the job object, to prevent clutter on your cluster:
kubectl delete job builder-queue
In this post, we saw how to expand on the previous post and utilize the full power of Kubernetes fault-tolerance and scalability. I recommend reading some more into the Kubernetes job object to fully understand all the available features and configuration options:
The simple example we saw of a hello world program in Go can easily be expanded to C++ compilation or any other build job/data processing/time-consuming task. The scale of pods can easily be increased to launch hundreds of parallel pods instead of the 4 we used in this example by changing the parallelism attribute. While your CI system is the most likely place to handle CI jobs, sometimes you will reach some limitations and it’s very useful to have the power of Kubernetes via the job object in your tool belt to use when needed.
Table of Contents
Shorten your builds
Incredibuild empowers your teams to be productive and focus on innovating.
Incredibuild empowers your teams to be productive and focus on innovating.
| Cookie | Duration | Description |
|---|---|---|
| cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
| cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
| cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
| cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
| cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
| viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |