HOME

This page is dedicated to interesting concepts I had touched that were introduced in pervious courses without much expansion

How to calculate floating point numbers, using IEEE-754 standard:

Floating point number = (-1)^Sign * (1.Fraction) * Radix^(Exponent - Bias)
Bias = 2^(k-1) -1 1.Fraction refers to Fraction bits.
Example of Calculation of 1.Fraction:
Fraction bits = 10110
1.Fraction = 1 + [1 * 2^-1] + [0 * 2^-2] + [1 * 2^-3] + [1 * 2^-4] + [0 * 2^-5]

Floating point numbers table:

Name	Common Name	Radix	Sign bits	Exponent bits	Fraction bits	Bias
binary16	Half Precision	2	1	5	10	127
binary32	Single Precision	2	1	8	23	1023
binary64	Double Precision	2	1	11	52
binary128	Quadruple Precision	2	1	15	112