HOME

This page is dedicated to interesting concepts I had touched that were introduced in pervious courses without much expansion

How to calculate floating point numbers, using IEEE-754 standard:

Floating point number = (-1)^Sign * (1.Fraction) * Radix^(Exponent - Bias)
Bias = 2^(k-1) -1 1.Fraction refers to Fraction bits.
Example of Calculation of 1.Fraction:
Fraction bits = 10110
1.Fraction = 1 + [1 * 2^-1] + [0 * 2^-2] + [1 * 2^-3] + [1 * 2^-4] + [0 * 2^-5]

Floating point numbers table:

Name Common Name Radix Sign bits Exponent bits Fraction bits Bias
binary16 Half Precision 2 1 5 10 127
binary32 Single Precision 2 1 8 23 1023
binary64 Double Precision 2 1 11 52
binary128 Quadruple Precision 2 1 15 112