(This is the older material - click here - for current specification content)

5. Code Generation

Now that the compiler has completed its lexical and syntactic analysis, the next step is 'Code generation'.

Code generation is the final action of a compiler. It converts source code via the output of lexical and syntactic analysis into machine code. The result is stored as an object file.

Note that a single high level statement e.g. print file, can produce a lot of machine code. After all, one of the reasons for writing code in a high level language in the first place is to make more general commands without worrying about the minutae of how the CPU will actually carry it out.

Stage 1: Create machine code

For example a statement such as

 x = y + 3;

Would become a number of separate machine code instructions.

Another task of the code generation stage is to allocate memory locations to the variables and constant used by the source code.

(Note, the lexical analysis stage identified these variables and constants in the first place).

The code generator also works out relative addresses for jumping around within the program.

For example a statement such as

if (x = 1) then goto nextstep ... (more statements) nextstep:

The code generator knows the relative address of the label called 'nextstep' and adjusts the if (x=1) statement machine code to jump to it.

Stage 2: Optimise the code

Finally, the code generator tries to optimise the code.

Code optimisation means to produce code that is as fast and efficient as possible

Example

Consider these three lines of crude source code

x = y + 3

b = x

c = b

Can you see a more concise way of coding this?

After creating a draft set of machine code, the code optimisation phase will take a second pass through the generated code looking for ways to reduce the use of memory or to make the code faster.

In the example above, the variable b and x are just used to hold a value before finally passing it on to the variable 'c'.

So a faster bit of code would be to shorten the block to the equivalent of

c = y + 3

In this case the optimiser avoids allocating memory to the 'b' and 'x' variables and shortens the machine code to just represent c = y + 3.

It has removed the redundant code.

You can often tell the compiler to favour either speed or memory optimisation as in the real world you often cannot optimise both.

The code optimisation tricks a compiler has up its sleeve is one of the things that makes one compiler better than another.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: code optimisation