Wednesday, August 26, 2009

Stages of Compilation in Linux

A Source file is compiled and linked to form a executable binary file for execution in a architecture.Understanding the various stages of compilation helps in cross compilation of code.
for e.g. below steps shows compilation process using GCC compiler.

Source file:
It contains the Source program in text format. it can be of any langaue c,c++ etc.
for e.g. first.c is a C Source code.

Preprocessing:
it helps in creating fast and efficient code.
It reads from header files for creating a preprocessed source file.
All macros and constant symbols are replaced.
All conditional preprocessor directives are processed by preprocessor.
It provides conditional compilation.

gcc -E first.c -o first.i

The know the steps of preprocessing in console. compile using
gcc -v -E first.c -o first.i
v stands for verbose.

Assembler
: Takes preprocessed file and creates .s file called as assembly file.
it is mainly required for optimization(speed and space) of code.
for e.g. gcc -v -S first.i -o first.s

Relocatable Binary
:

gcc -c first.s -o first.o
Contains offset address of the assembly code, it is assigned at compile time.
object dump of first.o shows offset address.
for eg: A relocatable code contains call 19 <> .Its position depends on main position.
This file contains source in assembly and library routines.


Executable Binary:
gcc first.o by default it creates a.out else we can give as gcc first.o -o first

This loadable file contains loadable address in the form of segment and offset called as absolute address.Function calls entries present in PLT called as procedure linkage table.
executable file contains some run time library. This file is mainly created by linker which is OS dependent.

From executable binary to loadable binary code is created by
objdump -D first
for seeing in page wise- objdump -D first | more

File Format: To know the format of file, file command is used.

for e.g.
file first.c
it shows as text file.

file first.i
it shows as text file.

file first.s
it shows as Assembly file.

file first.o
it shows as binary file.

These set of tools are called tool chain. Cross compilers mainly required for executing the code in different architectures. The object dump of binary executable contains the loadable address.

The creation of files from .c to .o is can be used in any architecture.
.exe or binary executable is platform and architecture specific.

The executable has three flags:
suid: process keeps information about uid. if this flag is set.
sgid: process keeps information about gid. if this flag is set.
sticky: if this flag is set means requesting kernel to keep in memory after execution.

during execution of executable binary it is stored in process address space.
process address space contains different segments of memory like data,code,stack etc.

During runtime three key functions are performed.
--init
Resource allocation is done here.it calls different resource allocation routines.it calls specific kernel system calls.

--start
make a call to main and handover the control to functionality.

--fini
release of all resources allocated by init.

if system is crashed due to some bug in the kernel modules. The --fini routine is not called and resources are not deallocated. The parent will become a zombie process and performs all cleanup.

relocatable code is executed in various platform by a interface called as Runtime.
(Relocatable code .o file) functionality->unix runtime->binary(elf format with no extension)-> unix OS.
(Relocatable code .obj file) functionality->windows runtime->binary(coff format with .exe extension)->Windows OS.

Runtime layer also called as Application Binary Interface.

No comments:

Post a Comment