IEEE 754 Floating Point

The IEEE standard for floating point is the most common representation for floating point numbers on computers today. It is also the most efficient method in most cases.

Representation

To represent a floating point number using the IEEE we break it into three parts, the Sign, Exponent, and Mantissa. In the end the number will be represented sorta as if it was in scientific notation. That is, It will have a sign, exponent, and value the exponent is applied to. The exponent "shifts" the point around which sets the magnitude of the number (hence why it is called floating point).

Pasted image 20240404222755.jpg

The sign of mantissa

The sign is represented by the single Most Significant Bit. A 0 represents a positive number, and a 1 a negative number.

The biased exponent nm

hvBits used8-bits11-bits
32-bit number64-bit number
Bias1271023

| letvRange nm

hvBits used8-bits11-bits
32-bit number64-bit number
Bias1271023
letvRangebit 30 - 23bit 63 - 52

This section stores the exponent of the number. This field needs to be able to represent both positive and negative exponents (so the point can be moved either right or left). Rather than dealing with encoding this section as a two's complement or something like that we simply add a bias to the exponent which is then subtracted to get the actual value. For example, if we wanted to represent an exponent of -2 we would use (using a bias of 127 for a 32 bit number).

Note

This exponent will of course be rather than like you see in scientific notation because we are working in binary! However, you don't have to include the 2 just the exponent itself.

The normalized mantissa

The mantissa is the part of the number consisting of only the significant digits of a number. For example given in scientific notation, the mantissa is .

The normalized mantissa is where we have shifted the floating point as far left as possible. We always assume that there is a leading 1 before the point though (except in a certain case covered in Special Values)

Special Values

EXPONENTMANTISAVALUE
00exact 0
2550Infinity
0not 0denormalised (no assumed leading 1)
255not 0Not a number (NAN)