
Joseph Sibony
reading time:
In his book, Effective C++, Scott Meyers has something to say about the way he uses lhs and rhs for parameter names: “… two of my favorite parameter names and their meanings may not be immediately apparent, especially if you’ve never done time on a compiler-writing chain gang.”
When Scott wrote this, circa 1992, he must have had GCC in mind, since Clang/LLVM didn’t exist yet. Clang/LLVM fundamentally changed the way compilers were thought about and it demystified a lot of black arts that went into hand-crafted compilers. You can read more about what is Clang or GCC vs Clang here.
In this blog post, I want to show that you don’t need to be a compiler-writing chain gang member to understand the optimization possibilities of Clang. My vision is to demystify clang optimization flags so that you’ll be able to make best use of them and use different Clang optimization flags.
The post would use Clang in Windows environment (yes, Clang supports Windows compilations as mentioned in my previous blogs mentioned above). However, there is nothing too specific to Windows in this blog, understanding Clang optimization and reading some Assembly for that is relevant in the exact same way for Linux as well. So, if you are a Linux C++ programmer keep reading, as this post is also for you.
A note of caution before I attempt to decipher Clang optimization flags. Clang/LLVM is a very active project. I am working on the latest released version of Clang/LLVM from April 15, 2021. There have been 12228 commits on the master since this release and I fear what I write might become outdated very soon. ?

C:\>clang --version
clang version 12.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
Let us take the first example from my “how to avoid a C++ compilation failure” blog post to further understand the optimization flags.
void ConvertStringToPasswordForm(char password[])
{
while (*password != '\0') *password++ = '*';
}
And the driver:
int main()
{
char password[] = "MyTopSecurePasswordPublishedInABlog:-)";
ConvertStringToPasswordForm(password);
std::cout << "Password :: " << password << std::endl;
}
If we run it across the Clang compiler using the command:
C:\Work\Temp>clang Example1.cpp
Clang compiler silently compiles it and creates an executable a.exe by default. Let us do a quick side-by-side comparison of the behavior of clang against the behavior of Microsoft C++ compiler cl.
| Clang | Microsoft C++ compiler (Cl) |
|
|
| Output: a.exe | Output: Example.exe |
| Size: 244,224 bytes | Size: 186,368 bytes |
The Microsoft compiler is vocal about the compiler and linker versions used and it produces a smaller executable. The question is: What flags should be passed to Clang so that its space optimization is comparable or even surpasses the Cl?
Before we answer this question let us go through the documentation where code generation options for Clang flags are discussed: https://clang.llvm.org/docs/CommandGuide/clang.html. For easy reference I am replicating the information here:

* -O4 and higher – Currently equivalent to -O3, see: https://clang.llvm.org/docs/CommandGuide/clang.html#code-generation-options
Armed with this information, let us now try space optimization starting with -O1.
Running Clang with -O1 flag:
clang -O1 Example1.cpp
gives 236,032 bytes for a.exe. There is a definite reduction in the size of the executable. The default Clang flag is -O0 which generates nonoptimized code.

If you compare binaries of the executables you get with the -O0 and -O1 flag, you will see some differences, but you cannot make out what caused these differences. What optimizations were turned on? For that let us go through the assembly listing of code generated using flags -O0 and -O1. We generate assembly listing for -O0 and -O1 by the commands.
clang -S -O1 -mllvm --x86-asm-syntax=intel Example1.cpp
clang -S -mllvm --x86-asm-syntax=intel Example1.cpp
Note: The -S flag runs pre-processing and compilation steps only.

The assembly code listing between # — Begin function and #– End function gives a clear picture of optimizations that were done to ConvertStringToPasswordForm.
Let us list the differences we see in the assembly:

This should be seen in the context of what is generated with the -O0 flag:

Highlights here are:
Note:
I experimented further with Clang optimization flags -O2 and -Os on example1.cpp. Here is the space reduction as a table for quick reference:
| -O0 | 244,224 bytes |
| -O1 | 236,032 bytes |
| -O2 | 233,984 bytes |
| -Os | 231,424 bytes |
| -Oz | 229,376 bytes |
As can be seen, there is a progressive reduction in the size of the executable while going from -O0 (no optimization) to -Oz (aggressive space optimizations). I am sure, although I have not measured it, that compilation time also increased progressively during these stages.
It may be difficult to analyze assembly code without practice. Reading (as opposed to writing) assembly is a skill I wholeheartedly recommend any developer learns. Take heart, Clang/LLVM has a switch that describes which optimizations were used during a compilation run:
clang -O3 -foptimization-record-file=Opt.txt Example1.cpp
The Opt.txt file will contain the details of all optimizations that were attempted in multiple passes. You will get entries like:
--- !Analysis
Pass: prologepilog
Name: StackSize
DebugLoc: { File: Example1.cpp, Line: 3, Column: 0 }
Function: '?ConvertStringToPasswordForm@@YAXQEAD@Z'
Args:
- NumStackBytes: '0'
- String: ' stack bytes in function'
...
In LLVM, Optimizations are implemented as passes that traverse some portion of a program to either collect information or to transform the program. In the above entry, the pass name is “prologepilog”. You will get full information about different compiler switches from the online reference: https://clang.llvm.org/docs/ClangCommandLineReference.html.
You will also get online help by running clang –help or clang –help-hidden. Yes, there is a hidden help that describes advanced switches available!
The heart of Clang is LLVM and a blog post on Clang optimization flags is incomplete without getting to know how to work with LLVM intermediate representation. Here is how to get the IR byte code:
clang -c -O1 -emit-llvm Example1.cpp -o Example.bc
Traditionally, the LLVM bytecode file has the .bc extension. To further work with the bytecode file, you will need tools that are not directly available from the Clang/LLVM installer.
First, download the LLVM source code from https://github.com/llvm/llvm-project/releases/tag/llvmorg-12.0.0. Extract the source to a folder named llvm-project-llvmorg-12.0.0. Create a folder named build under llvm-project-llvmorg-12.0.0\LLVM. Install python as a prerequisite.
Now you are ready to use CMake. Oh, you have not been introduced to CMake? Go over my blogs, and you will find everything you need.
Let me just show you one tool called opt from the LLVM tools folder. To compile this from the source use the commands:
cd build
cmake .. -DLLVM_TARGETS_TO_BUILD=X86
cmake --build . -t opt
Remember, this is not a quick build. There are 92 dependent libraries to be built before you can get the opt.exe final artifact. You can get opt.exe to print help and you can see all the optimizations that LLVM supports. Here is a subset of what you get:

As we reach the end of this blog post, I want to introspect. Did I achieve what I set to do, understanding the usage of Clang optimization flags? I believe I did. Clang/LLVM is not a tool you approach just for fun. Understanding the fundamental tool in development – a compiler – and its behavior is central for progressing from a novice to a journeyman programmer. As a programmer you need to understand Clang compiler flags that alters the compilation output. Of course, if you have written an LLVM optimization pass, have done static code analysis using LLVM, or have a deep understanding of Global Value Numbering then you have progressed from journeyman to master. Allow me to Take a bow for the job well done!

Table of Contents
Shorten your builds
Incredibuild empowers your teams to be productive and focus on innovating.
Incredibuild empowers your teams to be productive and focus on innovating.
| Cookie | Duration | Description |
|---|---|---|
| cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
| cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
| cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
| cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
| cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
| viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |