9. Superscalar

Handling (executing) an integer instruction is different to handling a floating point instruction. And so the chip designers build separate, independent hardware blocks to handle them. Superscalar takes advantage of this fact.

If these hardware blocks are largely independent, why not allow them to run in parallel?

This idea is shown in the diagram below:

superscalar cpu architecture

In this diagram, there is a single, fast, fetch unit followed by three independent decode units and four parallel execute units. Each execute unit is dedicated to a specific task.

These are

Handle integers
Handle floating point
Handle a compare operation
Handle an address operation

This is called a 'superscalar' arrangement. The idea is that it is likely that 3 instructions in a row will need different types of processing. Maybe the first instruction needs an integer operation, whilst the next instruction is a compare and the third is some kind of memory operation. If this is the case, then all three instructions can be handled in parallel, so tripling performance for those instructions.

But they must be independent of one another, otherwise they can't be run in parallel.

This is how it works: The fetch unit is designed to get 3 instructions. These three instructions are then allocated to decode units 1,2 and 3 respectively.

Each decode unit works out what kind of operation is required and dispatches the command to the appropriate execute unit - integer, floating point, compare or memory.

The power of this technique lies in being able to execute multiple instructions in parallel. Of course, if two integer operations in a row are required for example, then this technique does improve things.

But the chip makers have analysed thousands of software programs which show that this approach significantly improves performance.

Prediction

In the section above, the words 'probable', 'likely', 'possible', 'maybe' have been used. This is deliberate because superscalar depends on probability.

No one can predict what instructions a software program will use. However, all programs need to be converted into machine code in order to run on a CPU. And this always involves a compiler or an interpreter..

A compiler translates high level commands such as 'print' or 'read' into machine code. So this is an opportunity for the machine code to be arranged in such a way that it makes best use of superscalar on that particular CPU. This 'code optimisation' is partly why some compilers are better than others.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: Superscalar processor