# Double Precision (Not)

From this list, the gist is that most languages can't process `9999999999999999.0 - 9999999999999998.0`

Why do they output 2 when it should be 1? I bet most people who've never done any formal CS (a.k.a maths and information theory) are super surprised.

Before you read the rest, ask yourself this: if all you have are zeroes and ones, how do you handle *infinity*?

If we fire up an interpreter that outputs the value when it's typed (like the Swift REPL), we have the beginning of an explanation:

```
Welcome to Apple Swift version 4.2.1 (swiftlang-1000.11.42 clang-1000.11.45.1). Type :help for assistance.
1> 9999999999999999.0 - 9999999999999998.0
$R0: Double = 2
2> let a = 9999999999999999.0
a: Double = 10000000000000000
3> let b = 9999999999999998.0
b: Double = 9999999999999998
4> a-b
$R1: Double = 2
```

Whew, it's not that the languages can't handle a simple substraction, it's just that `a`

is *typed* as `9999999999999999`

but *stored* as `10000000000000000`

.

If we used integers, we'd have:

```
5> 9999999999999999 - 9999999999999998
$R2: Int = 1
```

Are the decimal numbers broken? 😱

##### A detour through number representations

Let's look at a byte. This is the fundamental unit of data in a computer and is made of 8 bits, all of which can be 0 or 1. It ranges from `00000000`

to `11111111`

( `0x00`

to `0xff`

in hexadecimal, `0`

to `255`

in decimal, homework as to why and how it works like that due by monday).

Put like that, I hope it's obvious that the question "yes, but how do I represent the integer `999`

on a byte?" is meaningless. You *can* decide that `00000000`

means `990`

and count up from there, or you can associate arbitrary values to the 256 possible combinations and *make* `999`

be one of them, but you can't have both the `0`

- `255`

range **and** `999`

. You have a finite number of possible values and that's it.

Of course, that's on 8 bits (hence the 256 color palette on old games). On 16, 32, 64 or bigger width memory blocks, you can store up to `2ⁿ`

different values, and that's it.

##### The problem with decimals

While it's relatively easy to grasp the concept of infinity by looking at "how high can I count?", it's less intuitive to notice that there is *the same amount of numbers between 0 and 1* as there are integers.

So, if we have a finite number of possible values, how do we decide which ones make the cut when talking decimal parts? The smallest? The most common? Again, as a stupid example, on 8 bits:

- maybe we need
`0.01`

...`0.99`

because we're doing accounting stuff - maybe we need
`0.015`

,`0.025`

,...,`0.995`

for rounding reasons - We'll just encode the numeric part on 8 bits (
`0`

-`255`

), and the decimal part as above

But that's already 99+99 values taken up. That leaves us 57 possible values for *the rest of infinity.* And that's not even mentionning the totally arbitrary nature of the selection. This way of representing numbers is historically the first one and is called "fixed" representation. There are many ways of choosing how the decimal part behaves and a lot of headache when coding how the simple operations work, not to mention the complex ones like square roots and powers and logs.

##### Floats (IEEE 754)

To make it simple for chips that perform the actual calculations, floating point numbers (that's their name) have been defined using two parameters:

- an integer
`n`

- a power (of base
`b`

)`p`

Such that we can have `n x bᵖ`

, for instance `15.3865`

is 153863 x 10^(-4). The question is, how many bits can we use for the `n`

and how many for the `p`

.

The standard is to use 1 bit for the sign (+ or -), 23 bits for `n`

, 8 for `p`

, which use 32 bits total (we like powers of two), and using base 2, and `n`

is actually `1.n`

. That gives us a range of ~8 million values, and powers of 2 from -126 to +127 due to some special cases like infinity and NotANumber (NaN).

`(-1 or 1)(2^[-126...127])(1.[one of the 8 million values])`

In theory, we have numbers from -10⁴⁵ to 10^{38} roughly, but some numbers can't be represented in that form. For instance, if we look at the largest number smaller than 1, it's `0.9999999404`

. Anything between that and 1 has to be rounded. Again, infinity can't be represented by a finite number of bits.

##### Doubles

The floats allow for "easy" calculus (by the computer at least) and are "good enough" with a precision of 7.2 decimal places on average. So when we needed more precision, someone said "hey, let's use 64 bits instead of 32!". The only thing that changes is that `n`

now uses 52 bits and `p`

11 bits.

Coincidentally, double has more a meaning of double *size* than double *precision,* even though the number of decimal places does jump to 15.9 on average.

We still have 2³² more values to play with, and that does fill some annoying gaps in the infinity, but not all. Famously (and annoyingly), 0.1 doesn't work in any precision size because of the base 2. In 32 bits float, it's stored as 0.100000001490116119384765625, like this:

`(1)(2⁻⁴)(1.600000023841858)`

Conversely, after double size (aka doubles), we have quadruple size (aka quads), with 15 and 112 bits, for a total of 128 bits.

##### Back to our problem

Our value is `9999999999999999.0`

. The closest possible value encodable in double size floating point is actually `10000000000000000`

, which should now make some kind of sense. It is confirmed by Swift when separating the two sides of the calculus, too:

```
2> let a = 9999999999999999.0
a: Double = 10000000000000000
```

Our big brain so good at maths knows that there is a difference between these two values, and so does the computer. It's just that using doubles, it can't store it. Using floats, `a`

will be rounded to `10000000272564224`

which isn't exactly better. Quads aren't used regularly yet, so no luck there.

It's funny because this is an operation that we puny humans can do very easily, even those humans who say they suck at maths, and yet those touted computers with their billions of math operations per second can't work it out. Fair enough.

The kicker is, there is a litteral *infinity* of examples such as this one, because trying to represent infinity in a finite number of digits is impossible.