The IEEE standard for floating point is the most common representation for floating point numbers on computers today. It is also the most efficient method in most cases.
To represent a floating point number using the IEEE we break it into three parts, the Sign, Exponent, and Mantissa. In the end the number will be represented sorta as if it was in scientific notation. That is, It will have a sign, exponent, and value the exponent is applied to. The exponent "shifts" the point around which sets the magnitude of the number (hence why it is called floating point).
The sign is represented by the single Most Significant Bit. A 0 represents a positive number, and a 1 a negative number.
hvBits used | 8-bits | 11-bits |
---|---|---|
| 32-bit number | 64-bit number |
Bias | 127 | 1023 |
| letvRange nm
hvBits used | 8-bits | 11-bits |
---|---|---|
| 32-bit number | 64-bit number |
Bias | 127 | 1023 |
letvRange | bit 30 - 23 | bit 63 - 52 |
This section stores the exponent of the number. This field needs to be able to represent both positive and negative exponents (so the point can be moved either right or left). Rather than dealing with encoding this section as a two's complement or something like that we simply add a bias to the exponent which is then subtracted to get the actual value. For example, if we wanted to represent an exponent of -2 we would use
This exponent will of course be
The mantissa is the part of the number consisting of only the significant digits of a number. For example given
The normalized mantissa is where we have shifted the floating point as far left as possible. We always assume that there is a leading 1 before the point though (except in a certain case covered in Special Values)
EXPONENT | MANTISA | VALUE |
---|---|---|
0 | 0 | exact 0 |
255 | 0 | Infinity |
0 | not 0 | denormalised (no assumed leading 1) |
255 | not 0 | Not a number (NAN) |