teach-ict.com logo

THE education site for computer science and ICT

1. Floating point limits

Real numbers (those with decimal parts) are represented as binary floating point numbers. We have a section on it here.

Floating point is always contained within a fixed amount of bits. For example a 32 bit computer can conveniently use 4 bytes to represent a floating point number i.e. 32 bits

This means that there is a limit as to the range of numbers that can be represented.

Floating point was developed because it can cover a wide range and yet have reasonable precision. It can do so because the decimal point is allowed to 'float' compared to a fixed point scheme.

A 32 bit represenation is called 'single precision' and using the computer industry standard format for floating point (IEEE standard) the largest number is

±3.4 x 1038

with a precision of about 7 decimal digits.

On the other hand the smallest number that it can represent is

±1.4 x 10-45

This is quite all right for most applications. But for scientific or engineering application running on the computer such as CAD or a simulation, there is the option to go to 'double precision' which is using 64 bits or 8 bytes. This vastly increases the range but then again it does run a lot slower.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: Floating point IEEE 754