Parallel CI: Comparing CI Systems Parallelization

In my last blog post on the subject, I wrote about parallelization solutions that are independent of your CI/CD platform. In this follow-up post about Parallel CI, I will review the available parallelization features of common CI/CD platforms, some of which we reviewed in previous blog posts: Jenkins, Bamboo, GitLab, CircleCI, and Bitbucket. These CI systems allow you to parallelize beyond the limitations of a single machine and with simpler declarative syntax and Web UI for status reporting and build logs.

Why Do We Want to Parallelize Beyond a Single Machine?

As your project grows and becomes more complex (more on code inflation here), so do the requirements from the build process. Running integration tests may require setting up supporting infrastructure and performing other time-consuming tasks. While a single machine nowadays can be quite powerful, eventually you may reach some limits. Additionally, you may want to run some processes on different operating systems or on machines with different hardware (for example – GPU for machine learning). For these reasons the requirement for parallelizing beyond a single machine is quite common.

Most CI/CD platforms provide a perfect solution for these requirements. Whether it is a self-hosted solution like Jenkins, which can integrate with other solutions to set up additional machines (for example – using Kubernetes or AWS). Or, using a CI/CD SAAS solution that provides the infrastructure you need.

Jenkins Pipeline – Running Concurrent Jobs

jenkins

Jenkins supports many different ways of defining jobs and with plugins, even more ways are possible. I will focus on the latest and recommended method of using the declarative pipeline with no additional plugins.

Let’s start with the simpler parallel directive which can appear under any stage in the pipeline (except if that stage is already under a parallel directive – nested parallel directives are not supported). Under the parallel directive, you may define stages and steps using all available standard directives, including nested stages (as long as they are not parallel themselves).

Within the parallel stages, you can specify the agent label forcing tests to run on different servers as required by each stage.

The following example assumes a Makefile with default task which compiles to a binary, with additional tasks for publishing to / downloading from artifact storage and running tests on the binary:

pipeline { 
    agent any 
    stages { 
        stage('Compile & Publish Binary') { 
            steps { 
                make && make publish 
            } 
        } 
        stage('Run Tests') { 
            when { 
                branch 'master' 
            } 
            failFast true 
            parallel { 
                stage('Test A') { 
                    agent { 
                        label "for-test-a" 
                    } 
                    steps { 
                        make download && make test_a 
                    } 
                } 
                stage('Test B') { 
                    agent { 
                        label "for-test-b"
                    } 
                    steps { 
                        make download && make test_b 
                    } 
                } 
            } 
        } 
    } 
}

In the previous example, the parallel test stages will start only after the compile & publish stage completes successfully. We also included the “failFast true” directive, so the build will fail immediately when one of the tests fails and any further stages will not run. If we had not included the “failFast true” directive, then the Jenkins runner will wait for both test stages to complete (whether successful or not) and only then the build will end and the next stage will start.

The matrix directive is a bit more complex but is very useful in many cases, especially for testing on different platforms. Under the matrix directive, you define a list of axes, each axis containing a name and list of values. All the different combinations of axis values are combined and all run in parallel. Many directives like the agent directive can be set under the matrix directive and use the expanded axis values.

In the following example we extend the previous example to run the tests on different operating systems:

pipeline { 
    agent any 
    stages { 
        stage('Compile & Publish Binary') { 
            steps { 
                make && make publish 
            } 
        } 
        stage('Run Tests') { 
            when { 
                branch 'master' 
            } 
            failFast true 
            matrix { 
                agent { 
                    label "${PLATFORM}-for-test-${TEST}" 
                } 
    axes { 
        axis { 
            name 'PLATFORM' 
            values 'linux', 'mac', 'windows' 
        } 
        axis { 
            name 'TEST' 
            values 'a','b'
        } 
    } 
    stages { 
                    stage('Test') { 
                        steps { 
                            make download && make test_${TEST} 
                        } 
                    } 
                } 
            } 
        } 
    } 
}

In this example, the tests will run 6 times for each of the combinations of PLATFORM and TEST. We use the agent directive to select a different server for each combination and we use the TEST variable within the executed stage.

While the matrix feature is useful for naturally parallelized tasks like multi-platform testing, many other tasks are much harder to split. A compilation process of a C++ application, which produces a single binary, cannot be easily split. This is where solutions like Incredibuild with Jenkins integration shine and allow you to optimize beyond the inherent limits of your build process.

CircleCI- Running Parallel Steps

CircleCI pipelines workflows feature provides a relatively simple way to configure a collection of jobs and their run order. This can be combined with the job parallelism attribute that launches a number of executors for a job, with each executor handling part of the processing for that job.

Using the workflows feature, you first define the jobs you want to run and the steps to run within each job, you then configure workflows, which are a nested list of jobs which all run in parallel. You can optionally include dependencies between jobs, allowing you to wait for a certain job to complete before starting other jobs.

First, we will define the jobs, these are the basic building blocks which we use in the workflows:

jobs: 
  compile_and_publish: 
    docker: 
      image: circleci/buildpack-deps:focal 
    steps: 
      - checkout 
      - run: 
          command: make && make publish 
  make: 
    parameters: 
      task: 
        type: string 
        default: "" 
    docker: 
      image: circleci/buildpack-deps:focal 
    steps:
      - checkout 
      - run: 
          command: make << parameters.task >>

Then we can define the workflows combining these jobs in certain execution order:

workflows: 
  version: 2 
  compile_publish_and_test: 
    jobs: 
      - compile_and_publish 
      - make: 
          task: test_a 
          requires: compile_and_publish 
      - make: 
          task: test_b 
          requires: compile_and_publish

In the above example, the compile_and_publish job will run first, and only if it succeeded, the test jobs will then run in parallel. CircleCI will wait for both the test jobs to complete (whether successfully or not) before continuing to the next job. However, job status is reported in real-time and using the CircleCI UI you can cancel the run or retry a specific job without waiting for the entire workflow to complete. CircleCI has a lot more options to configure job dependencies, check out the documentation for more details: The when attribute, Background commands, Ending a job from within a step, The when step, requires, using when in workflows.

We can further extend workflows using matrix jobs, this allows us to run a job using a combination of given parameters:

workflows: 
  version: 2 
  compile_publish_and_test: 
    jobs: 
      - compile_and_publish 
      - make: 
          requires: compile_and_publish 
          matrix: 
            parameters: 
              test: [test_a, test_b]

If our make task had an additional platform parameter, we could run the following to execute all combinations of the given parameter values:

matrix: 
            parameters: 
              test: [test_a, test_b] 
              platform: [windows, mac, linux]

CircleCI has another very useful feature to support parallelization within a single job. Let’s say you have a single job which runs make tasks defined in a text file, one make task per line:

jobs: 
  tests: 
    docker: 
      image: circleci/buildpack-deps:focal 
    steps: 
      - checkout 
      - run: 
          command: while read TASK; do make $TASK; done < tests.txt

Using the CircleCI parallelism attribute you specify how many executors you want to start, and then you use the CircleCI CLI tool to run the same task on multiple executors. This allows you to implement the task in such a way that each executor will run some of the tasks. Note that this option does not allow splitting an existing make task, as you are still limited by your build system, but it does help to split a group of tasks automatically and run each independent task in parallel:

jobs: 
  tests: 
    docker: 
      image: circleci/buildpack-deps:focal 
    parallelism: 4    
    steps: 
      - checkout 
      - run: 
          command: for TASK in $(circleci tests split tests.txt); do make $TASK; done

The above example will launch 4 executors and each executor will run some of the tasks which are defined in tests.txt file. The CircleCI tests split command has a few different options, including a very useful option to split based on timing data, see the CircleCI parallelism documentation for more details.

CircleCI has many useful parallelization features and is the most feature-rich from the reviewed CI systems. The option to split based on timing data is especially useful but quite complex so I encourage you to learn about in more details from the documentation. As mentioned above, these useful features still do not help you solve the inherent limitations of your build system and allow you to split up a single monolithic task, like C++ compilation, to parallelize even further (and again, for large C++ projects this extra optimization may be critical and this is where Incredibuild’s solution comes naturally in).

Atlassian Bamboo

There are several methods to define CI/CD pipelines in Bamboo, we will focus on the Bamboo specs yaml, but the same concepts can be applied to the other methods as well.

Bamboo supports only basic parallelization, running all jobs under a defined stage in parallel, so nothing special is needed to support parallelization:

version: 2 
plan: 
  project-key: MYPROJECT 
  name: My Project 
  key: BUILD 
 

stages: 
  - Compile and publish: 
      jobs:  
        - Compile and publish 
  - Test: 
      jobs: 
        - Test A 
        - Test B 
 

Compile and publish: 
  tasks: 
    - script: 
        - make && make publish 
 

Test A: 
  tasks: 
    - script:  
        - make test_a 
 

Test B: 
  tasks: 
    - script:  
        - make test_b

In this example, test a and test b will run in parallel only after the compile and publish stage completes successfully. Bamboo doesn’t have any directives to control dependencies between tasks, so both parallel tasks will run to completion. There is the concept of Final tasks which will always be executed, regardless of whether any build tasks or other final tasks fail. Final tasks will be executed even if you stop the build manually. Read more about this concept in the documentation.

Unfortunately, Bamboo has quite limited parallelization options compared to the other CI systems, so you will have to rely on your build system or DevOps engineers to further optimize your build process. (And again, for C++ builds, Incredibuild’s solution may come in to help in achieving the required optimization and allowing your developers to focus on development instead of waiting for the build to complete).

Bitbucket Pipelines- Parallel Steps

Bitbucket Pipelines supports very simple and intuitive parallel steps using the parallel directive:

pipelines: 
  default: 
    - step: 
        script: 
          - make && make publish 
    - parallel: 
        - step: 
            script: 
              - make test_a 
        - step: 
            script: 
              - make test_b

In this example, same as the previous examples, the tests are run in parallel only after the build step completes successfully. Both test steps will run in parallel to completion regardless of whether they succeed or fail. BitBucket doesn’t have options to configure dependencies between tasks. See the full documentation for parallel steps for more details.

The Bitbucket parallelization features are very limited and quite similar to Bamboo (and in fact, both solutions are owned by Atlassian). It means that most of the parallelization and build optimization work will have to be done by your DevOps engineers (and again, for C++ builds we of course recommend getting the help of Incredibuild’s solutions).

Parallelization With GitLab CI

Gitlab

GitLab supports parallelization without any specific directive. All the jobs in the same stage run in parallel, this works the same way for any pipeline architecture:

stages: 
  - build 
  - test 
 

image: ubuntu 
 

build_and_publish: 
  stage: build 
  script: 
    - make && make publish 
 

test_a: 
  stage: test 
  script: 
    - make test_a 
 

test_b: 
  stage: test 
  script: 
    - make test_b

In the above example, test_a and test_b belong to the same stage, so they will run in parallel only after the build_and_publish stage is completed successfully. GitLab has highly advanced configuration options for dependencies between tasks and stages, read the relevant documentation for more details: directed acyclic graph, child/parent pipelines

GitLab also supports parallelization within a single job. The parallel attribute configures how many instances of the job to start. Each job instance will receive environment variables containing the total number of instances started and the index of each job instance:

test: 
  stage: test 
  parallel: 10 
  script: 
    - echo Running job instance $CI_NODE_INDEX out of a total of $CI_NODE_TOTAL jobs

The above example will start 10 job instances in parallel, each job instance run will show a different job index in the CI_NODE_INDEX environment variables with CI_NODE_TOTAL environment variable value being 10 – the total number of jobs.

Another option available using the parallel attribute of a job is to define a matrix of values. Using this feature you define a matrix of values and parallel job instances start to handle all combination of the values:

test: 
  stage: test 
  parallel: 
    matrix: 
      - PLATFORM: windows 
        OS: ["8", "8.1", "10"] 
      - PLATFORM: linux 
        OS: [ubuntu, debian, fedora] 
      - PLATFORM: mac 
        OS: [mojave, catalina, bigsur] 
  script:  
    - echo test $CI_NODE_INDEX out of $CI_NODE_TOTAL 
    - echo platform: $PLATFORM  OS: $OS

The above example will run a total of 9 jobs in parallel, each job will have different environment variable values as demonstrated in the script echo statements.

You can also use the include attribute instead of script to trigger a downstream pipeline:

deploy: 
  stage: deploy 
  parallel: 
    matrix: 
      - PROVIDER: aws 
        STACK: [monitoring, app1] 
      - PROVIDER: ovh 
        STACK: [monitoring, backup] 
      - PROVIDER: [gcp, vultr] 
        STACK: [data] 
  trigger: 
    include: path/to/child-pipeline.yml

The above example will trigger the child-pipeline.yml 6 times for all combinations of values.

See the trigger documentation for details on how to use downstream pipelines.

GitLab’s solution is quite feature rich, the ability to use downstream pipelines allows for code reuse which can greatly help achieve build time performance boosts. While highly useful this still does not help to split up existing tasks to parallelize even further and achieve greater performance boosts, where again Incredibuild’s solution, which integrates well with any CI system, including GitLab, will help you split a single Make task to parallel execution.

Comparison – Parallel CI

While all major CI/CD systems support some form of parallelization, we can see differences, with some systems providing distinguishing features and others providing only basic parallelization options. The following table summarizes the differences:

	Jenkins	CircleCI	Bamboo	Bitbucket	GitLab
Parallel steps / stages	✓	✓	✓	✓	✓
Parallel based on values matrix	✓	✓	✗	✗	✓
Parallel on different servers / operating systems	✓	✓	✗	✗	✗
Start multiple instances of the same job	✗	✓	✗	✗	✓
Automatically split a job based on different data	✗	✓	✗	✗	✗
Control over task dependencies	✓	✓	✗	✗	✓

Summary

After reading this post you hopefully better understand how to use your CI system’s capabilities to parallelize. This allows you to scale beyond the limitations of a single machine and have the added value of the CI systems UI which shows the build jobs statuses and a simple declarative syntax for defining your pipeline.

One important point to note is the inherent limitations of your build processes. At the end of the day, if your build process compiles a single binary in a single Makefile task, your CI/CD platform won’t help you to parallelize it. This is not something that is easily done by a CI system, which has no knowledge of the internal build execution. In order to optimize such use-cases, you’ll need something that tightly integrates with C++ build tools, such as Incredibuild’s solution.

Another important point that was left out is artifact handling. In the examples, after you build the binary, you want it to be available for all the parallel instances which run tests on it. This is a subject for a future blog post, as each CI system has different features and usage for artifacts and there are also available options regardless of your CI system.

Joseph Sibony

Joseph Sibony, Incredibuild's Senior Content Manager, has spent his life surrounded by technology. From hardware to software and everything in between. He has worked in data science, cyber security, and has written extensively about the intersection of technology and society.

Cookie	Duration	Description
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-8508435-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
MR	7 days	This cookie, set by Bing, is used to collect user information for analytics purposes.
utm_campaign	2 months	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	2 months	This cookie is used for storing the session content value if present.
utm_source	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
MUID	1 year 24 days	Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Parallel CI – Comparing CI Systems Parallelization

Why Do We Want to Parallelize Beyond a Single Machine?

Jenkins Pipeline – Running Concurrent Jobs

CircleCI- Running Parallel Steps

Atlassian Bamboo

Bitbucket Pipelines- Parallel Steps

Parallelization With GitLab CI

Comparison – Parallel CI

Summary

Joseph Sibony

Table of Contents

Shorten your builds

Related Posts

13 minutes Platform Engineering vs DevOps: A Comprehensive Comparison

13 minutes These 4 advantages of caching are a game-changer for development projects

13 minutes Build Cache Today and Tomorrow

Cookie	Duration	Description
_hjSession_2537450	30 minutes	No description
_hjSessionUser_2537450	1 year	No description
AnalyticsSyncHistory	1 month	No description
BIGipServersn-mch-v2-80	session	No description
BIGipServersn02web-nginx-app_https	session	No description
ib_last_referrer	2 months	No description
incap_ses_1319_2167377	session	No description
li_gc	2 years	No description
muc_ads	2 years	No description
nlbi_2167377	session	No description
original_req_url	past	No description
referrer66_00f	1 month	No description
visid_incap_2167377	1 year	No description
visitorId	1 year	No description