Compiling With Clang Optimization Flags

Dori Exterman
Dori Exterman reading time: 7 minutes
August 12, 2021

In his book, Effective C++, Scott Meyers has something to say about the way he uses lhs and rhs for parameter names: “… two of my favorite parameter names and their meanings may not be immediately apparent, especially if you’ve never done time on a compiler-writing chain gang.”

When Scott wrote this, circa 1992, he must have had GCC in mind, since Clang/LLVM didn’t exist yet. Clang/LLVM fundamentally changed the way compilers were thought about and it demystified a lot of black arts that went into hand-crafted compilers. You can read more about what is Clang or GCC vs Clang here.

In this blog post, I want to show that you don’t need to be a compiler-writing chain gang member to understand the optimization possibilities of Clang. My vision is to demystify clang optimization flags so that you’ll be able to make best use of them and use different Clang optimization flags.

The post would use Clang in Windows environment (yes, Clang supports Windows compilations as mentioned in my previous blogs mentioned above). However, there is nothing too specific to Windows in this blog, understanding Clang optimization and reading some Assembly for that is relevant in the exact same way for Linux as well. So, if you are a Linux C++ programmer keep reading, as this post is also for you.

A note of caution before I attempt to decipher Clang optimization flags. Clang/LLVM is a very active project. I am working on the latest released version of Clang/LLVM from April 15, 2021. There have been 12228 commits on the master since this release and I fear what I write might become outdated very soon. ?

Clang Optimization Flags_0

C:\>clang --version
clang version 12.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin

Let us take the first example from my “how to avoid a C++ compilation failure” blog post to further understand the optimization flags.

void ConvertStringToPasswordForm(char password[])
{
      while (*password != '\0') *password++ = '*';
}

And the driver:

int main()
{    
      char password[]  = "MyTopSecurePasswordPublishedInABlog:-)";
      ConvertStringToPasswordForm(password);
      std::cout << "Password :: " << password << std::endl;
}

If we run it across the Clang compiler using the command:

        C:\Work\Temp>clang Example1.cpp

Clang compiler silently compiles it and creates an executable a.exe by default. Let us do a quick side-by-side comparison of the behavior of clang against the behavior of Microsoft C++ compiler cl.

Clang Microsoft C++ compiler (Cl)
C:\Work\Temp>clang Example1.cpp
C:\Work\Temp>cl Example1.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

Example1.cpp
C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.27.29110\include\ostream(747): warning C4530:
  C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Example1.cpp(12): note: see reference to function template instantiation 'std::basic_ostream<char,std::char_traits<char>>
  &std::operator <<<std::char_traits<char>>(std::basic_ostream<char,std::char_traits<char>> &,const char *)' being compiled
Microsoft (R) Incremental Linker Version 14.27.29111.0
Copyright (C) Microsoft Corporation.  All rights reserved.
/out:Example1.exe
Example1.obj
Output: a.exe Output: Example.exe
Size: 244,224 bytes Size: 186,368 bytes

The Microsoft compiler is vocal about the compiler and linker versions used and it produces a smaller executable. The question is: What flags should be passed to Clang so that its space optimization is comparable or even surpasses the Cl?

Before we answer this question let us go through the documentation where code generation options for Clang flags are discussed: https://clang.llvm.org/docs/CommandGuide/clang.html. For easy reference I am replicating the information here:

Clang Optimization Flags_1

* -O4 and higher – Currently equivalent to -O3, see: https://clang.llvm.org/docs/CommandGuide/clang.html#code-generation-options

Armed with this information, let us now try space optimization starting with -O1.

Space Optimizations

Running Clang with -O1 flag:

clang -O1 Example1.cpp

gives 236,032 bytes for a.exe. There is a definite reduction in the size of the executable. The default Clang flag is -O0 which generates nonoptimized code.

Clang Optimization Flags_2

If you compare binaries of the executables you get with the -O0 and -O1 flag, you will see some differences, but you cannot make out what caused these differences. What optimizations were turned on? For that let us go through the assembly listing of code generated using flags -O0 and -O1. We generate assembly listing for -O0 and -O1 by the commands.

clang -S -O1 -mllvm --x86-asm-syntax=intel Example1.cpp
clang -S -mllvm --x86-asm-syntax=intel Example1.cpp

Note:  The -S flag runs pre-processing and compilation steps only.

Clang Optimization Flags_3

The assembly code listing between # — Begin function and #– End function gives a clear picture of optimizations that were done to ConvertStringToPasswordForm.

Let us list the differences we see in the assembly:

  1. The .seh_proc, .seh_stackalloc, .seh_endprologue and .seh_endproc functions are not generated with -O1 flag. Structured exception handling is fully switched off for the function ConvertStringToPasswordForm with the -O1 flag.
  2. The tight loop generated using -O1 flag thereby reducing space:Clang Optimization Flags_4

This should be seen in the context of what is generated with the -O0 flag:

Clang Optimization Flags_5

Highlights here are:

  • Reduced number of labels.
  • Efficient instructions like lea (load effective address) and jne (jump not equal to) being used
  • The comparison and jumping out of the loop to label LBB0_3 if eax is 0 is completely avoided.

Note:

I experimented further with Clang optimization flags -O2 and -Os on example1.cpp. Here is the space reduction as a table for quick reference:

-O0 244,224 bytes
-O1 236,032 bytes
-O2 233,984 bytes
-Os 231,424 bytes
-Oz 229,376 bytes

As can be seen, there is a progressive reduction in the size of the executable while going from -O0 (no optimization) to -Oz (aggressive space optimizations). I am sure, although I have not measured it, that compilation time also increased progressively during these stages.

Further Analysis

It may be difficult to analyze assembly code without practice. Reading (as opposed to writing) assembly is a skill I wholeheartedly recommend any developer learns. Take heart, Clang/LLVM has a switch that describes which optimizations were used during a compilation run:

clang -O3 -foptimization-record-file=Opt.txt Example1.cpp

The Opt.txt file will contain the details of all optimizations that were attempted in multiple passes. You will get entries like:

--- !Analysis
Pass:            prologepilog
Name:            StackSize
DebugLoc:        { File: Example1.cpp, Line: 3, Column: 0 }
Function:        '?ConvertStringToPasswordForm@@YAXQEAD@Z'
Args:
- NumStackBytes:   '0'
- String:          ' stack bytes in function'
...

In LLVM, Optimizations are implemented as passes that traverse some portion of a program to either collect information or to transform the program. In the above entry, the pass name is “prologepilog”. You will get full information about different compiler switches from the online reference: https://clang.llvm.org/docs/ClangCommandLineReference.html.

You will also get online help by running clang –help or clang –help-hidden. Yes, there is a hidden help that describes advanced switches available!

Clang Optimization Flags – Are We Done Yet? No!

The heart of Clang is LLVM and a blog post on Clang optimization flags is incomplete without getting to know how to work with LLVM intermediate representation. Here is how to get the IR byte code:

clang -c -O1 -emit-llvm Example1.cpp -o Example.bc

Traditionally, the LLVM bytecode file has the .bc extension. To further work with the bytecode file, you will need tools that are not directly available from the Clang/LLVM installer.

First, download the LLVM source code from https://github.com/llvm/llvm-project/releases/tag/llvmorg-12.0.0. Extract the source to a folder named llvm-project-llvmorg-12.0.0. Create a folder named build under llvm-project-llvmorg-12.0.0\LLVM. Install python as a prerequisite.

Now you are ready to use CMake. Oh, you have not been introduced to CMake? Go over my blogs, and you will find everything you need.

Let me just show you one tool called opt from the LLVM tools folder. To compile this from the source use the commands:

cd build
cmake .. -DLLVM_TARGETS_TO_BUILD=X86
cmake --build . -t opt

Remember, this is not a quick build. There are 92 dependent libraries to be built before you can get the opt.exe final artifact. You can get opt.exe to print help and you can see all the optimizations that LLVM supports. Here is a subset of what you get:

Clang Optimization Flags_6

Conclusion

As we reach the end of this blog post, I want to introspect. Did I achieve what I set to do, understanding the usage of Clang optimization flags? I believe I did. Clang/LLVM is not a tool you approach just for fun. Understanding the fundamental tool in development – a compiler – and its behavior is central for progressing from a novice to a journeyman programmer. As a programmer you need to understand Clang compiler flags that alters the compilation output. Of course, if you have written an LLVM optimization pass, have done static code analysis using LLVM, or have a deep understanding of Global Value Numbering then you have progressed from journeyman to master. Allow me to Take a bow for the job well done!

speed up c++

Dori Exterman
Dori Exterman reading time: 7 minutes minutes August 12, 2021
August 12, 2021

Table of Contents

Related Posts

7 minutes 8 Reasons Why You Need Build Observability

Read More  

7 minutes These 4 advantages of caching are a game-changer for development projects

Read More  

7 minutes What Level of Build Observability Is Right for You?

Read More