...making Linux just a little more fun!
In the last article we saw some of the basic GCC options, and noted that it supports several CPU architectures. One of the topics we will cover in this article is how to turn on optimizations for different architectures and what happens when they are turned on. We will also look at some other nifty tricks which we can do with GCC.
Profiling is a method of identifying sections of code that consume large portions of execution time. Profiling basically works by inserting monitoring code at specific points in the program. This code can be inserted by using the -pg option of GCC.
When debugging we need extra information added to the binaries. Programs compiled with the -g flag additional information which can be used by gdb (or other debuggers) is added to the binary. This increases the size of the binaries but is necessary for debugging. When compiling debugging binaries we should turn off all optimization flags. GCC can add debugging information in several different formats such as stabs, dwarf-2 or coff format.
$ gcc -g -o helloworld helloworld.c #for adding debugging information $ gcc -pg -o helloworld helloworld.c #for profiling
Programs compiled with profiling and/or debugging turned on are usually referred to as debug binaries, as opposed to production binaries which are compiled with optimization flags.
We previously saw that the compilation of the program to get an executable binary consists of different phases. Each of the main compile stages (compiling to assembly language, assembling and linking) is done by different executables (e.g. cc1, as and collect2). We use the -time option to GCC to get a breakdown of the time required for each stage.
$ gcc -time helloworld.c # cc1 0.02 0.00 # as 0.00 0.00 # collect2 0.04 0.01
We can also gather more fine-grained statistics about the various stages of the compiler cc1 using the -Q option. This shows the real time spent as well as time spent in userspace and kernel modes.
$ gcc -Q helloworld.c main Execution times (seconds) preprocessing : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.24 (38%) wall parser : 0.01 (50%) usr 0.00 ( 0%) sys 0.02 ( 3%) wall expand : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 5%) wall global alloc : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 5%) wall shorten branches : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 6%) wall symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 2%) wall rest of compilation : 0.01 (50%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall TOTAL : 0.02 0.00 0.64
Before we move on to optimizations, we need to look at how a compiler is able to generate code for different platforms and different languages. The process of compilation has three components:
There are two kinds of optimizations possible - optimizations for speed and optimizations for space. In an ideal world, both would be possible at the same time, (Actually some optimizations do both - such as common sub-expression elimination). More often than not optimizing for speed increases the memory footprint (the size of the program loaded in memory) and vice versa. Expanding functions inline is a good example of this case. Inlining functions reduces the overhead of a function call but ends up replicating code wherever the inline function has been called, thus increasing the size of the executable. Turning on optimizations will increase the compilation time as the compiler has to analyze the code more.
GCC offers four optimization levels. These are specified by the -O<Optimization Level> flag. The default is no optimization or -O0 (notice the capital O). Various optimizations are turned on by the each of the different levels (-O1, -O2 and -O3). Even if we give higher optimization levels such as -O25, they have the net effect of enabling the highest level of optimizations (-O3). In addition to these four optimization level there is another optimization level -Os which enables all the optimizations for space as well as those optimizations which do not increase the size of the code but give speed improvements. In -O1 optimization level, only those optimizations are done which reduce code size and execution time without increasing compilation times significantly. In -O2 optimization level, those optimizations which do have a space-execution time tradeoff are done. Almost all optimizations are turned on by -O3 optimization level but compilation time might increase significantly by turning it on.
$ gcc -O3 -o hello3 helloworld.c $ gcc -O0 -o hello0 helloworld.c $ ls -l -rwxr-xr-x 1 vinayak users 8722 2005-03-24 17:59 hello3 -rwxr-xr-x 1 vinayak users 8738 2005-03-24 17:59 hello0 $ time ./hello3 > /dev/null real 0m0.002s user 0m0.001s sys 0m0.000s $ time ./hello0 > /dev/null real 0m0.002s user 0m0.000s sys 0m0.003s
As seen above, compiling the program with -O3 optimization level reduces the size of the executable compared to the compiling with -O0 optimization level (no optimization)
It is also possible to have CPU or architecture specific optimizations. For example a particular architecture may have numerous registers. These can be utilized by the register allocation algorithm intelligently so as to store temporary variable between calculation to minimize cache and memory accesses thus ensuring considerable speed ups in CPU-intensive operations. Some of the platform specific optimization flags can be done using -march=<architecture type> or -mcpu=<CPU name>. For the x86 and x86-64 family of processors, -march implicitly implies -mcpu. Some of the architecture types options in this family are ix86 (i386, i486, i586, i686), Pentiumx (pentium, pentium-mmx, pentiumpro, pentium2, pentium3 ,pentium4) and athlon (athlon, athlon-tbird, athlon-xp, opteron). But executables built with platform specific flags may not run on other CPUs. For example, executables generated with the -march=i386 will run on i686 platform because of backward compliance of the platforms. However, executables generated with the -march=i686 may not run on the older platforms as some of the instructions (or extended instruction sets) do not exist on older CPUs. If you use the Gentoo Linux distribution, you might be already familiar with some of these flags.
$ gcc -o matrixMult -O3 -march=pentium4 MatrixMultiplication.c #optimise for Pentium 4 $ gcc -o matrixMult -O3 -march=athlon-xp MatrixMultiplication.c #optimise for Athlon-XP
Also you can give specific flags for optimizations at the command line such as -finline-functions (for integrating simple functions into their callers) and -floop-optimize (to optimize loop control structures). The important thing to remember is that the order of the flags on the command line matters. The option on the right will override ones on the left. This is a good way to choose particular options without cluttering the compilation command line. It is also possible to do it for platform-specific optimizations.
$ gcc -o inlineDemo -O3 -fno-inline-functions InlineDemo.c $ gcc -o matrixMult -march=pentium4 -nosse2 MatrixMultiplication.c
In the example, the optimization level -O3 will enable inlining of functions - the effect is same as -O2 -finline-functions -frename-registers. So if you have the option -fno-inline-functions on the right on the command line, it will disable inlining of functions. This command will turn on all the Pentium4 specific optimizations but code generated will not contain any MMX, SSE and SSE2 instructions.
All the options supported by GCC on your machine can be seen by giving the following command:
$ gcc -v --help | less
This will list out all the different options that are supported by GCC and the processes invoked by GCC on your machine. This is a pretty huge list and should contains most of the options discussed by us in this article and the earlier one in this series and more.
In this article, we looked mainly at the various GCC optimizations options and how they work. In the next part of this series we will look at another development tool called make used for building big projects.
Vinayak Hegde is currently working for Akamai Technologies Inc. He
first stumbled upon Linux in 1997 and has never looked back since. He
is interested in large-scale computer networks, distributed computing
systems and programming languages. In his non-existent free time he
likes trekking, listening to music and reading books. He also
maintains an intermittently updated blog.