The Four stages of compilation a C program

Agustin Espinoza
4 min readFeb 10, 2022

--

The C source code compilation process is a multi-step process, involving preprocessing, code compilation, library linking, and so on.

The process of converting source code written in any programming language — usually a mid- or high-level language — into a machine-level language that is understandable by the computer is known as Compilation. The software used for this conversion is known as a compiler.

In a build process, the compiler checks the source code for syntactic or structural errors, and if everything is perfect, it generates the code object.

In this article we will follow the steps of the compilation process and learn how in the C language, through compilation, the C language source code is converted into object code — machine code or binary code.

C Program Compilation Pipeline

Typically, the C program build process files take a few seconds, but during this short time, the C source code enters a pipeline and many different components perform their task.

Before continuing, there are two rules that we must know:

C program compilation rule

  • Only source files are compiled.
  • Each file is compiled separately.

The components of the C program compile line are:

  1. Preprocessor.
  2. Compilation
  3. Assembly
  4. Linker

Each component in the build pipeline accepts a certain input from the previous component and produces a certain output for the next component in the C program’s build pipeline.

This process continues until the last component in the build pipeline generates the required output file, that is, a binary file. One thing to know about the build pipeline is that it will only generate output if, and only if, the source file successfully passes through all the components in the build pipeline. Even a small failure in any of the components can cause a compile or link failure and will generate an error message

1. Preprocessing

Preprocessing is the first step in the C program compilation pipeline. When writing a C program, we include libraries, define some macros, and sometimes even do some conditional compilation. All of these are known as preprocessor directives.

During the preprocessing step in the C program’s build pipeline, the preprocessing directives are replaced with their original values.

2. Compilation

The build phase provides us with assembly code that is unique to the target architecture.

In this step the compiler takes action by taking a preprocessed file which checks for syntax or structure errors –in case of errors the compilation process stops and displays the corresponding errors–. After compiling it, it generates an intermediate code in assembly language –file.s–.

Generating assembly code from C code is one of the most critical steps in the C program compilation pipeline, since assembly code is a low-level language that can be translated into an object file using assembler.

3. Assembly

The third stage of compilation. In this stage, an assembler is used to translate the assembly instructions into object code.

The assembler accepts the compiled source code — in assembly language — and translates it into low-level machine code. Each file has its own object file. After successful assembly, it generates a target code file –file.o–.

The object file contains “relocatable” machine code that is not directly executable because it is not yet mapped to any specific address in memory. Here the linker plays an important role and combines all the objects, resolves the references between modules and corrects the addresses.

4. Link

The linker performs two important tasks: resolution and relocation of symbols. The object code generated in the assembly stage is made up of machine instructions that the processor understands, but some parts of the program are out of order or missing. To produce an executable program, the existing pieces must be rearranged and the missing pieces must be completed. This process is called bonding.

The linker will organize the pieces of object code so that functions in some pieces can successfully call functions in others. You will also add parts that contain the instructions for the library functions used by the program. In the case of “Hello, world!” program, the linker will add the object code for the puts function.

The result of this stage is the executable file. The name of the executable file is the same as the source file, but it only differs in its extensions. under DOS the executable file extension is ‘.exe’ and under UNIX the executable file can be named ‘a.out’. For example; if we are using the printf() function in a program, then the linker adds its associated code to an output file.

GNU compilers (GCC)

GCC is a collection of compilers produced by the GNU Project, this tool converts the source code of various c-based programming languages — such as C++ or Objective C — into machine code.

gcc <file.c>

This is the basic composition of the gcc command, it takes the source file and does the build process from preprocessing to binding, it returns an executable file.

gcc -E <file.c>

With the -E option the compiler takes the source file and only does the preprocessing — first step of the process — , it does not compile, assembles or links and returns an expanded file.gcc

-S <archivo.c>

With the -S option the compiler takes the source file and performs the first two steps — preprocessing and compilation — , does not assemble or link, and returns an assembly language file.

gcc -c <file.c>

With the -c option the compiler takes the source file and performs the first steps of the tree — preprocessing, compilation and assembly — , it does not link and returns a file in object code.

For more information on the gcc command and its options, visit gcc man .

This was an explanation of the compilation process that is done in the C language before running the program.

--

--