
Joseph Sibony
reading time:
In his book, Effective C++, Scott Meyers has something to say about the way he uses lhs and rhs for parameter names: “… two of my favorite parameter names and their meanings may not be immediately apparent, especially if you’ve never done time on a compiler-writing chain gang.”
When Scott wrote this, circa 1992, he must have had GCC in mind, since Clang/LLVM didn’t exist yet. Clang/LLVM fundamentally changed the way compilers were thought about and it demystified a lot of black arts that went into hand-crafted compilers. You can read more about what is Clang or GCC vs Clang here.
In this blog post, I want to show that you don’t need to be a compiler-writing chain gang member to understand the optimization possibilities of Clang. My vision is to demystify clang optimization flags so that you’ll be able to make best use of them and use different Clang optimization flags.
The post would use Clang in Windows environment (yes, Clang supports Windows compilations as mentioned in my previous blogs mentioned above). However, there is nothing too specific to Windows in this blog, understanding Clang optimization and reading some Assembly for that is relevant in the exact same way for Linux as well. So, if you are a Linux C++ programmer keep reading, as this post is also for you.
A note of caution before I attempt to decipher Clang optimization flags. Clang/LLVM is a very active project. I am working on the latest released version of Clang/LLVM from April 15, 2021. There have been 12228 commits on the master since this release and I fear what I write might become outdated very soon. ?

C:\>clang --version
clang version 12.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
Let us take the first example from my “how to avoid a C++ compilation failure” blog post to further understand the optimization flags.
void ConvertStringToPasswordForm(char password[])
{
while (*password != '\0') *password++ = '*';
}
And the driver:
int main()
{
char password[] = "MyTopSecurePasswordPublishedInABlog:-)";
ConvertStringToPasswordForm(password);
std::cout << "Password :: " << password << std::endl;
}
If we run it across the Clang compiler using the command:
C:\Work\Temp>clang Example1.cpp
Clang compiler silently compiles it and creates an executable a.exe by default. Let us do a quick side-by-side comparison of the behavior of clang against the behavior of Microsoft C++ compiler cl.
Clang | Microsoft C++ compiler (Cl) |
|
|
Output: a.exe | Output: Example.exe |
Size: 244,224 bytes | Size: 186,368 bytes |
The Microsoft compiler is vocal about the compiler and linker versions used and it produces a smaller executable. The question is: What flags should be passed to Clang so that its space optimization is comparable or even surpasses the Cl?
Before we answer this question let us go through the documentation where code generation options for Clang flags are discussed: https://clang.llvm.org/docs/CommandGuide/clang.html. For easy reference I am replicating the information here:

* -O4 and higher – Currently equivalent to -O3, see: https://clang.llvm.org/docs/CommandGuide/clang.html#code-generation-options
Armed with this information, let us now try space optimization starting with -O1.
Space Optimizations
Running Clang with -O1 flag:
clang -O1 Example1.cpp
gives 236,032 bytes for a.exe. There is a definite reduction in the size of the executable. The default Clang flag is -O0 which generates nonoptimized code.

If you compare binaries of the executables you get with the -O0 and -O1 flag, you will see some differences, but you cannot make out what caused these differences. What optimizations were turned on? For that let us go through the assembly listing of code generated using flags -O0 and -O1. We generate assembly listing for -O0 and -O1 by the commands.
clang -S -O1 -mllvm --x86-asm-syntax=intel Example1.cpp
clang -S -mllvm --x86-asm-syntax=intel Example1.cpp
Note: The -S flag runs pre-processing and compilation steps only.

The assembly code listing between # — Begin function and #– End function gives a clear picture of optimizations that were done to ConvertStringToPasswordForm.
Let us list the differences we see in the assembly:
- The .seh_proc, .seh_stackalloc, .seh_endprologue and .seh_endproc functions are not generated with -O1 flag. Structured exception handling is fully switched off for the function ConvertStringToPasswordForm with the -O1 flag.
- The tight loop generated using -O1 flag thereby reducing space:
This should be seen in the context of what is generated with the -O0 flag:

Highlights here are:
- Reduced number of labels.
- Efficient instructions like lea (load effective address) and jne (jump not equal to) being used
- The comparison and jumping out of the loop to label LBB0_3 if eax is 0 is completely avoided.
Note:
I experimented further with Clang optimization flags -O2 and -Os on example1.cpp. Here is the space reduction as a table for quick reference:
-O0 | 244,224 bytes |
-O1 | 236,032 bytes |
-O2 | 233,984 bytes |
-Os | 231,424 bytes |
-Oz | 229,376 bytes |
As can be seen, there is a progressive reduction in the size of the executable while going from -O0 (no optimization) to -Oz (aggressive space optimizations). I am sure, although I have not measured it, that compilation time also increased progressively during these stages.
Further Analysis
It may be difficult to analyze assembly code without practice. Reading (as opposed to writing) assembly is a skill I wholeheartedly recommend any developer learns. Take heart, Clang/LLVM has a switch that describes which optimizations were used during a compilation run:
clang -O3 -foptimization-record-file=Opt.txt Example1.cpp
The Opt.txt file will contain the details of all optimizations that were attempted in multiple passes. You will get entries like:
--- !Analysis
Pass: prologepilog
Name: StackSize
DebugLoc: { File: Example1.cpp, Line: 3, Column: 0 }
Function: '?ConvertStringToPasswordForm@@YAXQEAD@Z'
Args:
- NumStackBytes: '0'
- String: ' stack bytes in function'
...
In LLVM, Optimizations are implemented as passes that traverse some portion of a program to either collect information or to transform the program. In the above entry, the pass name is “prologepilog”. You will get full information about different compiler switches from the online reference: https://clang.llvm.org/docs/ClangCommandLineReference.html.
You will also get online help by running clang –help or clang –help-hidden. Yes, there is a hidden help that describes advanced switches available!
Clang Optimization Flags – Are We Done Yet? No!
The heart of Clang is LLVM and a blog post on Clang optimization flags is incomplete without getting to know how to work with LLVM intermediate representation. Here is how to get the IR byte code:
clang -c -O1 -emit-llvm Example1.cpp -o Example.bc
Traditionally, the LLVM bytecode file has the .bc extension. To further work with the bytecode file, you will need tools that are not directly available from the Clang/LLVM installer.
First, download the LLVM source code from https://github.com/llvm/llvm-project/releases/tag/llvmorg-12.0.0. Extract the source to a folder named llvm-project-llvmorg-12.0.0. Create a folder named build under llvm-project-llvmorg-12.0.0\LLVM. Install python as a prerequisite.
Now you are ready to use CMake. Oh, you have not been introduced to CMake? Go over my blogs, and you will find everything you need.
Let me just show you one tool called opt from the LLVM tools folder. To compile this from the source use the commands:
cd build
cmake .. -DLLVM_TARGETS_TO_BUILD=X86
cmake --build . -t opt
Remember, this is not a quick build. There are 92 dependent libraries to be built before you can get the opt.exe final artifact. You can get opt.exe to print help and you can see all the optimizations that LLVM supports. Here is a subset of what you get:

Conclusion
As we reach the end of this blog post, I want to introspect. Did I achieve what I set to do, understanding the usage of Clang optimization flags? I believe I did. Clang/LLVM is not a tool you approach just for fun. Understanding the fundamental tool in development – a compiler – and its behavior is central for progressing from a novice to a journeyman programmer. As a programmer you need to understand Clang compiler flags that alters the compilation output. Of course, if you have written an LLVM optimization pass, have done static code analysis using LLVM, or have a deep understanding of Global Value Numbering then you have progressed from journeyman to master. Allow me to Take a bow for the job well done!

Table of Contents
Shorten your builds
Incredibuild empowers your teams to be productive and focus on innovating.