...making Linux just a little more fun!

<-- prev | next -->

Using the GNU Compiler Collection

By Vinayak Hegde

Introduction to GCC

The GNU C compiler is an integral part of the GNU system and was initially written by Richard Stallman. At first it only compiled C code. Later a group of volunteers started maintaining it and GCC gained the ability to support different languages such a C++, Fortran, Ada and Java. It was then renamed to GNU Compiler Collection to signify this change. In this article we shall look at mainly the C language compiler.

GCC is not only available on Linux but also on other Unix-like systems such as FreeBSD, NetBSD,OpenBSD as well as on Windows via Cygwin, MingW32 and Microsoft Services for Unix. It supports a wide variety of platforms such as the Intel x86 Architecture, AMD x86-64 ,Alpha and SPARC architectures. Due to this versatility of GCC, it is often used for cross compiling code for different architectures. Since the GCC source code is available and modular, it can easily be modified to emit binaries for obscure or new platforms, such as those used in embedded systems.

Basic compilation options

If GCC is available on your system, you can give the following command to see with what options it has been compiled with.

Command 1 - GCC specification and supported functionality
$ gcc -v
Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.3/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr \
--with-local-prefix=/usr/local --infodir=/usr/share/info \
--mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada \
--disable-checking --libdir=/usr/lib --enable-libgcj \
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib \
--with-system-zlib --enable-shared --enable-__cxa_atexit i586-suse-linux
Thread model: posix
gcc version 3.3.3 (SuSE Linux)

This gives a lot of information about GCC. You can see that the POSIX threading model is supported by this version so you can compile multi-threaded applications with it. It can also compile code written in C,C++,Fortran-77, Objective C, Java and Ada. Notice that the C++ include path is also specified, and that Java code can be compiled to native binaries with libgcj.

Let us write a small C program with a header file to see the various compilation options GCC supports.

// helloworld.h
#define COUNT 2

static char hello[] = "hello world";
// helloworld.c
#include <stdio.h>
#include "helloworld.h"
 
int main()
{
    int i;
    for(i = 0;i <= COUNT; i++)
    { 
        printf("%s - %d\n",hello,i);
    }
    return 0;
}

To compile the helloworld program to an object file we can give the command

Command 2 - Creating an Object File
$ gcc -v -c helloworld.c
...[output snipped]
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 \
-D__GNUC_PATCHLEVEL__=3 helloworld.c -quiet -dumpbase helloworld.c -auxbase helloworld \
-version -o /tmp/ccHmbDAJ.s
GNU C version 3.3.3 (SuSE Linux) (i586-suse-linux)
        compiled by GNU C version 3.3.3 (SuSE Linux).
GGC heuristics: --param ggc-min-expand=42 --param ggc-min-heapsize=23825
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include
 /usr/i586-suse-linux/include
 /usr/include
End of search list.
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../../i586-suse-linux/bin/as -V -Qy \
-o helloworld.o /tmp/ccHmbDAJ.s
GNU assembler version 2.15.90.0.1.1 (i586-suse-linux) using BFD version 2.15.90.0.1.1
20040303 (SuSE Linux)
Command 3 - Creating an Executable File
$ gcc -v -o helloworld helloworld.c
...[output snipped]
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker \
/lib/ld-linux.so.2 -o helloworld /usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crt1.o \
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crti.o \
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/crtbegin.o -L/usr/lib/gcc-lib/i586-suse-linux/3.3.3 \
-L/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../../i586-suse-linux/lib \
-L/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../.. /tmp/ccUyu9EA.o -lgcc \
-lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i586-suse-linux/3.3.3/crtend.o \
/usr/lib/gcc-lib/i586-suse-linux/3.3.3/../../../crtn.o

From the above output, we can see that gcc calls cc1 which is the actual C compiler to generate a assembler file called ccHmbDAJ.s. This is a randomly chosen name and this file is deleted after compilation is over. You can also see in what order the different include file paths are searched. We can modify the search paths for the include files by using the -I option and the library search paths by using the -L option. See the info ($ info gcc) pages for more information on these options. The temporary assembler file created is then passed onto the GNU assembler (as) which processes the file and generates binary code for that particular platform. The process stops here for the object file (command 1).

When creating a executable file an extra step is involved - Linking. From the output of command 2 we can see that file is dynamically linked with the libraries (note the usage of -L option here as well). collect2 is a utility which sets up the initialization routines and eventually calls ld to perform the linking to create the executable.

The role of the Preprocessor

The preprocessor is an important part of the C compiler. All the preprocessor directives start with a '#' (hash) sign. It processes the different preprocessor directives such as #define, #include, #ifdef, #pragma and #undef. As the name suggests, the preprocessor runs before the compilation of the program begins and processes the various directives to produce code ready to be compiled by the C compiler. It is possible to define fairly complex macros using the directives which can make the code more readable and reduce complexity. But sometimes it is possible that the complex macros are not getting expanded as we think they are. Also if some of the include files have the same name, it is possible that the wrong include file is getting picked up and causing compilation errors or causing odd behavior in the executable. The -E option can be used in such cases so that we can get the preprocessed output as the compiler sees it. we can reuse the example above to see the preprocessed output that the compiler produces.

Command 4 - Preprocessed output
$ gcc -E helloworld.c > helloworld.c.preprocess
Command 5 - Expanded Macros (#define)
$ gcc -E helloworld.c -dM | sort | less

Command 4 will produce a large preprocessed file with all the included files and all the expanded macros. You can open the file in your favorite editor and take a look at it. This the C source the C compiler looks at. When I ran the above command on my desktop machine it produced 455 lines of output excluding whitespace. Command 5 shows all the #define'd macros after sorting them. It is also possible to define macros on the compilation command line. For example see the output of Command 2 where __GNUC__, __GNUC_MINOR__ and __GNUC_PATCHLEVEL__ are all defined as 3 as the GCC version used for compilation is GCC 3.3.3

Generating output in Assembly language

GCC converts the C code into assembly language before converting into binary code. In some instances you might want to look at the code generated or tweak it for performance reasons before finally converting into binary code. You can do it using the following command.

$ gcc -S helloworld.c

The output generated is as follows:

        .file   "helloworld.c"
        .data
        .type   hello, @object
        .size   hello, 12
hello:
        .string "hello world"
        .section        .rodata
.LC0:
        .string "%s - %d\n"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $0, -4(%ebp)
.L2:
        cmpl    $2, -4(%ebp)
        jle     .L5
        jmp     .L3
.L5:
        subl    $4, %esp
        pushl   -4(%ebp)
        pushl   $hello
        pushl   $.LC0
        call    printf
        addl    $16, %esp
        leal    -4(%ebp), %eax
        incl    (%eax)
        jmp     .L2
.L3:
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.3 (SuSE Linux)"

From the output above, we can see that hello is defined as the string "hello world". It is read-only data as we have defined it as static. main is the only global function. The .LC0 section show the parameters for printf. These are then pushed onto the stack before printf is called in the main loop (.L5). The .L2 section has the code for checking the conditions for the for loop. The .L3 section contains the cleanup routines after the function has exited. The generated humanly-readable assembly language output can be changed before being compiled into binary code by using as (the GNU Assembler) and then linking it with the libraries.

Conformance and warning options

GCC has it's own extensions to the C standard. These extensions are used by many GNU programs as well as other software including the Linux kernel. These extensions may not be available with other compilers on other platforms. So if you want to write portable code, you might want to use the -ansi option. Using this option along with -pedantic option will ensure that any code that in no conforming to the ISO C standard will be flagged with a warning. Also it is possible for us to specify what standard we want to adhere to using the -std= option. Standards supported by this option include the ISO C89 standard (-std=c89), the more recent ISO C99 standard (-std=c99) and the ISO C++98 standard (-std=c++98).

Also it is always a good idea to turn some common warnings on, using -Wall option. But -Wall does not turn all warnings on. So it is a misnomer. Some of the other useful and common warnings which you might want to enable are -Wstrict-prototypes and -Wmissing-prototypes (warning if prototypes are not defined or defined improperly), -Werror (which turns all warning into errors) and -Wunreachable-code (if the compiler finds that a block of code will never execute).

Generating Makefile dependencies

make is automated building tool which is used for building large number of files in a C project. It will be covered in a later article in this series. If you have (say) 1000 files in a project, and you change just 1 or 2 files to fix a bug, you need not build the whole project again. You can specify what files are effected by the change by specifying dependencies and only those files will be recompiled. You can use GCC to generate these dependency lines. Take a look at the example below.

$ gcc -M helloworld.c
helloworld.o: helloworld.c /usr/include/stdio.h /usr/include/features.h \
  /usr/include/sys/cdefs.h /usr/include/gnu/stubs.h \
  /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include/stddef.h \
  /usr/include/bits/types.h /usr/include/bits/wordsize.h \
  /usr/include/bits/typesizes.h /usr/include/libio.h \
  /usr/include/_G_config.h /usr/include/wchar.h /usr/include/bits/wchar.h \
  /usr/include/gconv.h \
  /usr/lib/gcc-lib/i586-suse-linux/3.3.3/include/stdarg.h \
  /usr/include/bits/stdio_lim.h /usr/include/bits/sys_errlist.h \
  helloworld.h

This nifty little trick can save a lot of time when you are working against a stiff deadline.

Using library code

If you are developing a library for use by other programmers you need to use the -fpic option to generate Position Independent Code (PIC). When an executable is created certain offsets of functions and data are hardcoded into it. For a library, this is clearly not an option since library code has to be independent of hardcoded location offsets - the library code will eventually be linked into the executable (dynamically or statically). Also if you have a component which needs to be linked with multiple executables, you need to use the -shared option of gcc. This option is used mostly along with -fpic option to create shared libraries.

On most systems, the default behavior of gcc is to link dynamically. This can create problems if you do not want to distribute the shared library along with the executable. Also you might be in a situation where the shared library you have used on your system is not readily available. In such situation we can statically link the executable so that the library code need not be separately provided. But use this option with care as it will increase the size of the executable by quite a bit. The command option to statically link the output in gcc is (predictably) -static.

Conclusion

In this article we have seen a small overview of how gcc can be used to generate binaries and the various stages the C code goes through before being converted into binary code. In the next part in this series we will look at how the code generated can be optimized for a particular platform as well as options to generate debug binaries for use with gdb.

 


[BIO] Vinayak Hegde is currently working for Akamai Technologies Inc. He first stumbled upon Linux in 1997 and has never looked back since. He is interested in large-scale computer networks, distributed computing systems and programming languages. In his non-existent free time he likes trekking, listening to music and reading books. He also maintains an intermittently updated blog.

Copyright © 2005, Vinayak Hegde. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 120 of Linux Gazette, November 2005

<-- prev | next -->
Tux