1. Technology

What is a Compiler?

By

What actually Is A Compiler?:

A compiler is a program that translates human readable source code into computer executable machine code. To do this successfully the human readable code must comply with the syntax rules of whichever programming language it is written in. The compiler is only a program and cannot fix your programs for you. If you make a mistake, you have to correct the syntax or it won't compile.

What happens When You Compile Code?:

A compiler's complexity depends on the syntax of the language and how much abstraction that programming language provides. A C compiler is much simpler than

Here is what happens when you compile code.

Lexical Analysis

This is the first process where the compiler reads a stream of characters (usually from a source code file) and generates a stream of lexical tokens. For example the C++ code

int C= (A*B)+10;
might be analysed as these tokens:
type "int"
variable "C"
equals
leftbracket
variable "A"
times
variable "B"
rightbracket
plus
literal "10"

Next is Syntactical Analysis:

This output from Lexical Analyzer goes to the Syntactical Analyzer part of the compiler. This uses the rules of grammar to decide whether the input is valid or not. Unless variables A and B had been previously declared and were in scope, the compiler might say

  • 'A' : undeclared identifier.

Had they been declared but not initialized. the compiler would issue a warning

  • local variable 'A' used without been initialized.

You should never ignore compiler warnings. They can break your code in weird and unexpected ways.

  • Always fix compiler warnings!

One Pass Or Two?:

Some languages have been written so that a compiler can get away with reading the source code once and generating the machine code. Pascal is one such language. Many compilers require at least two passes. Why is this?

Sometimes it is because of

  • Forward Declarations of functions or classes.
  • How much optimization you require of the compiler.
In C++ a class can be declared but not defined until later. The compiler will be unable to work out how much memory the class needs until it has compiled the body of the class. Then it must reread the code before generating correct code.

Generating Machine Code:

Assuming that the compiler has successfully completed these stages

  • Lexical Analysis.
  • Syntactical Analysis.
The final stage is generating machine code. This can be an extremely complicated process, especially with modern CPUs.

The speed of the compiled executable should be as fast as possible and can vary enormously according to

  • The quality of the generated code.
  • How much optimization has been requested.
Most compilers let you specify the amount of optimization. Typically none for debugging (quicker compiles!) and full optimization for the released code.

Code Generation Is Challenging!:

The compiler writer faces challenges when writing a code generator. Many processors speed up processing by using

  • Instruction Pipelining.
  • Internal caches.
If all of the instructions within a loop can be held in the CPU cache then that loop will run much faster than if the CPU has to fetch instructions from main RAM. The CPU cache is a block of memory built into the CPU chip that is accessed much faster than data in the main RAM.

Caches And Queues:

Most CPUs have a prefetch queue where the CPU reads in instructions into the cache prior to executing them. If a conditional branch happens then the CPU has to reload the queue. So code should be generated to minimize this.

Many CPUs have separate parts for

  • Integer Arithmetic
  • Floating Point Arithmetic
So these operations can often run in parallel to increase the speed.

Compilers typically generate code into object files which are then linked together by a Linker program.

  1. About.com
  2. Technology
  3. C / C++ / C#
  4. Getting Started
  5. What Is a Compiler? What Happens When I Compile Code?:

©2014 About.com. All rights reserved.