-
Notifications
You must be signed in to change notification settings - Fork 20
Floating point issues
Ilya Yaroshenko edited this page Sep 28, 2016
·
12 revisions
Floating point operations can be cruel! Here are some mistakes that can introduce numerical errors.
float
< 0,double
> 0,real
< 0- CT and RT differences
- 32-bit vs. 64-bit have different behavior
- Direct vs. intermediate comparison
- "Same" function, different results
- std.math vs C
- Linux vs. Windows
- sin with different precision (+/- 0)
More issues are explained below.
Let's consider these two equivalent definitions to represent linear functions:
y1 = slope * (x - _y) + _a
y2 = slope * x + intercept
where intercept = slope * (- _y) + _a
.
Nota bene: If y2
is written fully: y2 = slope * x + slope * (- _y) + _a
, we see that the distributive law is used to transform from y1
. In other words the multiplication occurs before the addition in y1
.
Now let's see why y1
is the better representation:
alias S = double;
S slope = 2.87415e+15;
S _a = -0.139631;
S _y = -1.5;
S intercept = slope * (- _y) + _a; // 4.31123e+15
S x = -1.5;
S y1 = _a + slope * (x - _y); // -0.139631
S y2 = slope * x + intercept; // 0
btw with real
there's no difference ;-)
real x = -1.0; // -0x1p+0
enum y = -1.0; // -0x8p-3
-> Always use %a
(exact hexadecimal printing) to verify.