...making Linux just a little more fun!
The GNU C compiler is an integral part of the GNU system and was initially written by Richard Stallman. At first it only compiled C code. Later a group of volunteers started maintaining it and GCC gained the ability to support different languages such a C++, Fortran, Ada and Java. It was then renamed to GNU Compiler Collection to signify this change. In this article we shall look at mainly the C language compiler.
GCC is not only available on Linux but also on other Unix-like systems such as FreeBSD, NetBSD,OpenBSD as well as on Windows via Cygwin, MingW32 and Microsoft Services for Unix. It supports a wide variety of platforms such as the Intel x86 Architecture, AMD x86-64 ,Alpha and SPARC architectures. Due to this versatility of GCC, it is often used for cross compiling code for different architectures. Since the GCC source code is available and modular, it can easily be modified to emit binaries for obscure or new platforms, such as those used in embedded systems.
If GCC is available on your system, you can give the following command to see with what options it has been compiled with.
Command 1 - GCC specification and supported functionality$ gcc -v Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.3/specs Configured with: ../configure --enable-threads=posix --prefix=/usr \ --with-local-prefix=/usr/local --infodir=/usr/share/info \ --mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada \ --disable-checking --libdir=/usr/lib --enable-libgcj \ --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib \ --with-system-zlib --enable-shared --enable-__cxa_atexit i586-suse-linux Thread model: posix gcc version 3.3.3 (SuSE Linux)
This gives a lot of information about GCC. You can see that the POSIX threading model is supported by this version so you can compile multi-threaded applications with it. It can also compile code written in C,C++,Fortran-77, Objective C, Java and Ada. Notice that the C++ include path is also specified, and that Java code can be compiled to native binaries with libgcj.
Let us write a small C program with a header file to see the various compilation options GCC supports.
// helloworld.h #define COUNT 2 static char hello[] = "hello world";
// helloworld.c #include <stdio.h> #include "helloworld.h" int main() { int i; for(i = 0;i <= COUNT; i++) { printf("%s - %d\n",hello,i); } return 0; }
To compile the helloworld program to an object file we can give the command
Command 2 - Creating an Object File$ gcc -v -c helloworld.c ...[output snipped] /usr/lib/gcc-lib/i586-suse-linux/3.3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 \ -D__GNUC_PATCHLEVEL__=3 helloworld.c -quiet -dumpbase helloworld.c -auxbase helloworld \ -version -o /tmp/ccHmbDAJ.s GNU C version 3.3.3 (SuSE Linux) (i586-suse-linux) compiled by GNU C version 3.3.3 (SuSE Linux). GGC heuristics: --param ggc-min-expand=42 --param ggc-min-heapsize=23825 #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include /usr/i586-suse-linux/include /usr/include End of search list. /usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../../i586-suse-linux/bin/as -V -Qy \ -o helloworld.o /tmp/ccHmbDAJ.s GNU assembler version 2.15.90.0.1.1 (i586-suse-linux) using BFD version 2.15.90.0.1.1 20040303 (SuSE Linux)Command 3 - Creating an Executable File
$ gcc -v -o helloworld helloworld.c ...[output snipped] /usr/lib/gcc-lib/i586-suse-linux/3.3.3/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker \ /lib/ld-linux.so.2 -o helloworld /usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crt1.o \ /usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crti.o \ /usr/lib/gcc-lib/i586-suse-linux/3.3.3/crtbegin.o -L/usr/lib/gcc-lib/i586-suse-linux/3.3.3 \ -L/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../../i586-suse-linux/lib \ -L/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../.. /tmp/ccUyu9EA.o -lgcc \ -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i586-suse-linux/3.3.3/crtend.o \ /usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crtn.o
From the above output, we can see that gcc
calls
cc1
which is the actual C compiler to generate a
assembler file called ccHmbDAJ.s
. This is a randomly
chosen name and this file is deleted after compilation is over. You
can also see in what order the different include file paths are
searched. We can modify the search paths for the include files by
using the -I
option and the library search paths by
using the -L
option. See the info ($ info
gcc
) pages for more information on these options. The
temporary assembler file created is then passed onto the GNU
assembler (as
) which processes the file and generates
binary code for that particular platform. The process stops here
for the object file (command 1).
When creating a executable file an extra step is involved -
Linking. From the output of command 2 we can see that file is
dynamically linked with the libraries (note the usage of
-L
option here as well). collect2
is a
utility which sets up the initialization routines and eventually
calls ld to perform the linking to create the executable.
The preprocessor is an important part of the C compiler. All the
preprocessor directives start with a '#' (hash) sign. It processes
the different preprocessor directives such as #define
,
#include,
#ifdef
, #pragma
and #undef
. As the name suggests, the preprocessor
runs before the compilation of the program begins and processes the
various directives to produce code ready to be compiled by the C
compiler. It is possible to define fairly complex macros using the
directives which can make the code more readable and reduce
complexity. But sometimes it is possible that the complex macros
are not getting expanded as we think they are. Also if some of the
include files have the same name, it is possible that the wrong
include file is getting picked up and causing compilation errors or
causing odd behavior in the executable. The -E
option
can be used in such cases so that we can get the preprocessed
output as the compiler sees it. we can reuse the example above to
see the preprocessed output that the compiler produces.
$ gcc -E helloworld.c > helloworld.c.preprocessCommand 5 - Expanded Macros (#define)
$ gcc -E helloworld.c -dM | sort | less
Command 4 will produce a large preprocessed file with all the
included files and all the expanded macros. You can open the file
in your favorite editor and take a look at it. This the C source
the C compiler looks at. When I ran the above command on my desktop
machine it produced 455 lines of output excluding whitespace.
Command 5 shows all the #define
'd macros after sorting
them. It is also possible to define macros on the compilation
command line. For example see the output of Command 2 where
__GNUC__
, __GNUC_MINOR__
and
__GNUC_PATCHLEVEL__
are all defined as 3 as the GCC
version used for compilation is GCC 3.3.3
GCC converts the C code into assembly language before converting into binary code. In some instances you might want to look at the code generated or tweak it for performance reasons before finally converting into binary code. You can do it using the following command.
$ gcc -S helloworld.c
The output generated is as follows:
.file "helloworld.c" .data .type hello, @object .size hello, 12 hello: .string "hello world" .section .rodata .LC0: .string "%s - %d\n" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax subl %eax, %esp movl $0, -4(%ebp) .L2: cmpl $2, -4(%ebp) jle .L5 jmp .L3 .L5: subl $4, %esp pushl -4(%ebp) pushl $hello pushl $.LC0 call printf addl $16, %esp leal -4(%ebp), %eax incl (%eax) jmp .L2 .L3: movl $0, %eax leave ret .size main, .-main .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.3 (SuSE Linux)"
From the output above, we can see that hello
is
defined as the string "hello world". It is read-only data as we
have defined it as static. main
is the only global
function. The .LC0
section show the parameters for
printf
. These are then pushed onto the stack before
printf is called in the main loop (.L5
). The
.L2
section has the code for checking the conditions
for the for
loop. The .L3
section
contains the cleanup routines after the function has exited. The
generated humanly-readable assembly language output can be changed
before being compiled into binary code by using as
(the GNU Assembler) and then linking it with the libraries.
GCC has it's own extensions to the C standard. These extensions
are used by many GNU programs as well as other software including
the Linux kernel. These extensions may not be available with other
compilers on other platforms. So if you want to write portable
code, you might want to use the -ansi
option. Using
this option along with -pedantic option will ensure that any code
that in no conforming to the ISO C standard will be flagged with a
warning. Also it is possible for us to specify what standard we
want to adhere to using the -std=
option. Standards
supported by this option include the ISO C89 standard
(-std=c89
), the more recent ISO C99 standard
(-std=c99
) and the ISO C++98 standard
(-std=c++98
).
Also it is always a good idea to turn some common warnings on,
using -Wall
option. But -Wall
does not
turn all warnings on. So it is a misnomer. Some of the other useful
and common warnings which you might want to enable are
-Wstrict-prototypes
and
-Wmissing-prototypes
(warning if prototypes are not
defined or defined improperly), -Werror
(which turns
all warning into errors) and -Wunreachable-code
(if
the compiler finds that a block of code will never execute).
make
is automated building tool which is used for
building large number of files in a C project. It will be covered
in a later article in this series. If you have (say) 1000 files in
a project, and you change just 1 or 2 files to fix a bug, you need
not build the whole project again. You can specify what files are
effected by the change by specifying dependencies and only those
files will be recompiled. You can use GCC to generate these
dependency lines. Take a look at the example below.
$ gcc -M helloworld.c helloworld.o: helloworld.c /usr/include/stdio.h /usr/include/features.h \ /usr/include/sys/cdefs.h /usr/include/gnu/stubs.h \ /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include/stddef.h \ /usr/include/bits/types.h /usr/include/bits/wordsize.h \ /usr/include/bits/typesizes.h /usr/include/libio.h \ /usr/include/_G_config.h /usr/include/wchar.h /usr/include/bits/wchar.h \ /usr/include/gconv.h \ /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include/stdarg.h \ /usr/include/bits/stdio_lim.h /usr/include/bits/sys_errlist.h \ helloworld.h
This nifty little trick can save a lot of time when you are working against a stiff deadline.
If you are developing a library for use by other programmers you
need to use the -fpic
option to generate Position
Independent Code (PIC). When an executable is created certain
offsets of functions and data are hardcoded into it. For a library,
this is clearly not an option since library code has to be
independent of hardcoded location offsets - the library code will
eventually be linked into the executable (dynamically or
statically). Also if you have a component which needs to be linked
with multiple executables, you need to use the -shared
option of gcc
. This option is used mostly along with
-fpic
option to create shared libraries.
On most systems, the default behavior of gcc is to link
dynamically. This can create problems if you do not want to
distribute the shared library along with the executable. Also you
might be in a situation where the shared library you have used on
your system is not readily available. In such situation we can
statically link the executable so that the library code need not be
separately provided. But use this option with care as it will
increase the size of the executable by quite a bit. The command
option to statically link the output in gcc is (predictably)
-static
.
In this article we have seen a small overview of how gcc can be used to generate binaries and the various stages the C code goes through before being converted into binary code. In the next part in this series we will look at how the code generated can be optimized for a particular platform as well as options to generate debug binaries for use with gdb.
Vinayak Hegde is currently working for Akamai Technologies Inc. He
first stumbled upon Linux in 1997 and has never looked back since. He
is interested in large-scale computer networks, distributed computing
systems and programming languages. In his non-existent free time he
likes trekking, listening to music and reading books. He also
maintains an intermittently updated blog.