# Computer Science

Real Numbers

## Standard Form

Very large or very small denary numbers are often written in standard form. This clearly saves writing out a lot of digits. Standard form is a number between 1 and 10 multiplied by a power of 10.

For example,

5.67 x 103 = 5.67 x 1000 = 5670

6.23 x 10-2 = 6.23 x 0.01 = 0.0623

To convert a decimal number to standard form, move the decimal point so that the number lies between 1 and 10. The power of 10 is the number of places the decimal point was moved, positive if moved to the left, negative if moved to the right.

Real numbers are stored in the computer using a similar principle to standard form. Instead of using a power of 10 however, they are stored using a power of 2. The decimal part of the number is known as the **mantissa**, and the power of 2 to which it is raised is known as the **exponent**. For simplicity in the examples given will use 16 bits. In practice real numbers are stored using a minimum of 32 bits. The greater the number of bits for the mantissa, the greater the precision that the number can be stored. The greater the number of bits for the exponent the greater the range of the number.

Our 16 bit numbers will use 10 bits for the mantissa and 6 bits for the exponent.

## Converting From Denary To Two's Complement Format

### 6.5

Convert the absolute value of the decimal number to fixed point binary. | 110.1 |

Move the binary point so that the first digit is non-zero. | .1101 (3 places to the left) |

Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits. | 0110100000 |

If the original number was negative, convert it to two's complement form. This makes the mantissa. | 0110100000 |

Convert the number of places the binary point moved into a 6 bit binary number. | 000011 |

If the point was moved to the right, convert the number to two's complement form. This makes the exponent. | 000011 |

The whole floating point number is the mantissa followed by the exponent. | 0110100000000011 |

### 0.125

Convert the absolute value of the decimal number to fixed point binary. | 0.001 |

Move the binary point so that the first digit is non-zero. | .1 (moved 2 places to the right) |

Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits. | 0100000000 |

If the original number was negative, convert it to two's complement form. This makes the mantissa. | 0100000000 |

Convert the number of places the binary point moved into a 6 bit binary number. | 000010 |

If the point was moved to the right, convert the number to two's complement form. This makes the exponent. | 111110 |

The whole floating point number is the mantissa followed by the exponent. | 0100000000111110 |

### -42.75

Convert the absolute value of the decimal number to fixed point binary. | 101010.11 |

Move the binary point so that the first digit is non-zero. | .10101011 (moved 6 places to the left) |

Replace the binary point with a zero, pad out the right hand side of the number with 0s to make the number 10 digits. | 0101010110 |

If the original number was negative, convert it to two's complement form. This makes the mantissa. | 1010101010 |

Convert the number of places the binary point moved into a 6 bit binary number. | 000110 |

If the point was moved to the right, convert the number to two's complement form. This makes the exponent. | 000110 |

The whole floating point number is the mantissa followed by the exponent. | 1010101010000110 |

### -0.1875

Convert the absolute value of the decimal number to fixed point binary. | 0.0011 |

Move the binary point so that the first digit is non-zero. | .11 (moved 2 places to the right) |

0110000000 | |

If the original number was negative, convert it to two's complement form. This makes the mantissa. | 1010000000 |

Convert the number of places the binary point moved into a 6 bit binary number. | 000010 |

If the point was moved to the right, convert the number to two's complement form. This makes the exponent. | 111110 |

The whole floating point number is the mantissa followed by the exponent. | 1010000000111110 |

## Converting From Two's Complement Format To Denary

### 0100010000000011

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1. | 000011 = 3 |

If the mantissa was negative, perform two's complement to convert to a positive number. | 0100010000 |

Replace the first zero with a binary point. | .100010000 |

Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative). | 100.010000 |

Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative. | 4.25 |

### 0111000000111110

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1. | 111110 = -2 |

If the mantissa was negative, perform two's complement to convert to a positive number. | 0111000000 |

Replace the first zero with a binary point. | .111000000 |

Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative). | .00111000000 |

Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative. | 0.21875 |

### 1001111110000111

Convert the exponent of the number to denary. Perform two's complement if the exponent starts with a 1. | 000111 = 7 |

If the mantissa was negative, perform two's complement to convert to a positive number. | 0110000010 |

Replace the first zero with a binary point. | .110000010 |

Move the binary point the number of places indicated by the exponent (to the right if the exponent is positive, to the left if negative). | 1100000.1 |

Convert the fixed point binary number to denary. Remember to add the negative sign if the mantissa was negative. | -96.5 |

## IEEE Standard For Floating Point

This system uses 32 bits to represent a number. The bit pattern is slightly different to Two's Complement format. From the left, the bit pattern represents,

1 - Sign Bit

8 - Exponent stored in excess 127 mode (127 is added to the exponent before it is stored

23 - Mantissa (a leading 1-bit is implied with a binary point after it

## Minifloat Format

Minifloat format is a 16 bit representation of real numbers. It uses a sign bit, a 5-bit excess 15 mode exponent, 10 mantissa bits with an implied leading 1-bit and binary point.

## Normalisation Of Floating Point Numbers

### Precision

The **precision **of a floating point number depends on the number of bits used to represent the mantissa. To illustrate this point, consider the following denary number,

42 012 000

We can express this in standard form as .42012 x 10^{8} using 5 digits for the mantissa. If we only use 4 digits for the mantissa, we get .4201 x 10^{8} and lose some accuracy.

If we put the decimal point in another place, say .042012 x 10^{9}, we need more digits for the mantissa. Systems for representing numbers need to allow the maximum precision for a given number of digits stored.

With binary floating point, numbers are normalised to allow this to happen.

### Example 1

Place 0000100000000110 in normalised form.

0000100000000110 = .000100000 x 2^{6}

To normalise the number the decimal point should be moved in front of the first non-zero bit. If the decimal point is moved n places to the right then the power of 2 is reduced by n.

.000100000 x 2^{6} = .100000 x 2^{3} = **0100000000000011**

### Example 2

Place 1110111000000011 in normalised form.

1110111000000011 = -.00100100 x 2^{3}

-.00100100 x 2^{3} = -.100100 x 2^{1}

-.100100 x 2^{1} = -0100100000000001

-0100100000000001 = **1011100000000001**

## Key Facts

Normalised numbers always start with 2 different bits (01 for positive, 10 for negative). The mantissa of a positive number always lies between 0.5 and 1, and the mantissa of a negative number always lies between -0.5 and -1

Normalisation is used to,

- ensure the maximum precision for a given number of bits
- ensure that there is only one representation of a number