Post

CPP Floating-point Precision Issue

CPP Floating-point Precision Issue

浮点数存储格式

浮点数的存储格式可参考:Single-precision floating-point format 32 bitsDouble-precision floating-point format 64 bits

  • 单精度浮点型float,通常32位,至少有6位有效数字,取值范围10^-38 - 10^38
  • 双精度浮点型double,通常64位,15-17位有效数字,取值范围10^-308 - 10^308
  • 多精度浮点型long double,精度更高
  • A signed 32-bit integer variable has a maximum value of 2^31 − 1 = 2,147,483,647; An IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2^−23) × 2^127 ≈ 3.4028235 × 10^38

浮点数计算精度问题

C++

The portable way to get epsilon in C++ is:

#include <limits> std::numeric_limits<double>::epsilon()

Then the comparison function becomes:

#include <cmath> #include <limits> bool AreSame(double a, double b) { return std::fabs(a - b) < std::numeric_limits<double>::epsilon(); }

C/C++中:

double a = 12.03; double b = 22; long long c = a * b * 100000000L; printf("c[%lld]\n", c); // 26465999999 c = a * 100000000L * b; printf("c[%lld]\n", c); // 26466000000

亦或在python中:

Python 2.7.5 (default, Jun 17 2014, 18:11:42) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 1.1 + 0.1 1.2000000000000002
  • Actually, the error is because there is no way to map 0.1 to a finite binary floating point number.
  • Most fractions can’t be converted to a decimal with exact precision. A good explanation is here: Floating Point Arithmetic: Issues and Limitations

What can I do to avoid this problem? That depends on what kind of calculations you’re doing.

  • If you really need your results to add up exactly, especially when you work with money: use a special decimal datatype.
  • If you just don’t want to see all those extra decimal places: simply format your result rounded to a fixed number of decimal places when displaying it.
  • If you have no decimal datatype available, an alternative is to work with integers, e.g. do money calculations entirely in cents. But this is more work and has some drawbacks.

refer:

C/C++中的一些解决方案:

If you are looking for data type supporting money / currency then try this: decimal_for_cpp

The cpp_dec_float back-end is used in conjunction with number: It acts as an entirely C++ (header only and dependency free) floating-point number type that is a drop-in replacement for the native C++ floating-point types, but with much greater precision.

#include <iostream> #include <iomanip> #include <boost/multiprecision/cpp_dec_float.hpp> int main() { namespace mp = boost::multiprecision; // here I'm using a predefined type that stores 100 digits, // but you can create custom types very easily with any level // of precision you want. typedef mp::cpp_dec_float_100 decimal; decimal tiny("0.0000000000000000000000000000000000000000000001"); decimal huge("100000000000000000000000000000000000000000000000"); decimal a = tiny; while (a != huge) { std::cout.precision(100); std::cout << std::fixed << a << '\n'; a *= 10; } // (10000000000 - 5000000000) * 2.01 = 10049999999.999998 cpp_dec_float_50 a(std::to_string(2.01)); cpp_dec_float_50 b = ((10000000000 - 5000000000)) * a; long long c = b.convert_to<long long>(); }

PHP

浮点数的精度有限。尽管取决于系统,PHP 通常使用 IEEE 754 双精度格式,则由于取整而导致的最大相对误差为 1.11e-16。非基本数学运算可能会给出更大误差,并且要考虑到进行复合运算时的误差传递。此外,以十进制能够精确表示的有理数如 0.10.7,无论有多少尾数都不能被内部所使用的二进制精确表示,因此不能在不丢失一点点精度的情况下转换为二进制的格式。这就会造成混乱的结果:例如,floor((0.1+0.7)*10) 通常会返回7而不是预期中的8,因为该结果内部的表示其实是类似7.9999999999999991118...

所以永远不要相信浮点数结果精确到了最后一位,也永远不要比较两个浮点数是否相等。如果确实需要更高的精度,应该使用任意精度数学函数或者gmp函数

浮点数的字长和平台相关,尽管通常最大值是 1.8e308 并具有 14 位十进制数字的精度(64 位 IEEE 格式)。

浮点数的形式表示:

LNUM [0-9]+ DNUM ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*) EXPONENT_DNUM [+-]?(({LNUM} | {DNUM}) [eE][+-]? {LNUM})

例如:

1.234 1.2e3; 7E-10

Java

Java中float的精度为6-7位有效数字。double的精度为15-16位。在Java中,通常用到金钱计算的地方要用BigDecimal,因为正常的浮点数计算会出现精度丢失的问题。

System.out.println(0.05 + 0.01); // 0.060000000000000005 System.out.println(1.0 - 0.42); // 0.5800000000000001 System.out.println(4.015 * 100); // 401.49999999999994 System.out.println(123.3 / 100); // 1.2329999999999999

BigDecimal使用方法:

// 构造函数 BigDecimal(int); // 创建一个具有参数,所指定整数值的对象 BigDecimal(double); // 创建一个具有参数,所指定双精度值的对象 BigDecimal(long); // 创建一个具有参数,所指定长整数值的对象 BigDecimal(String); // 创建一个具有参数,所指定以字符串表示的数值的对象 // 方法 add(BigDecimal); // BigDecimal对象中的值相加,然后返回这个对象 subtract(BigDecimal); // BigDecimal对象中的值相减,然后返回这个对象 multiply(BigDecimal); // BigDecimal对象中的值相乘,然后返回这个对象 divide(BigDecimal); // BigDecimal对象中的值相除,然后返回这个对象 toString(); // 将BigDecimal对象的数值转换成字符串 doubleValue(); // 将BigDecimal对象中的值以双精度数返回 floatValue(); // 将BigDecimal对象中的值以单精度数返回 longValue(); // 将BigDecimal对象中的值以长整数返回 intValue(); // 将BigDecimal对象中的值以整数返回

注意,在使用BigDecimal时,使用它的BigDecimal(String)构造器创建对象才有意义。其他的如BigDecimal b = new BigDecimal(1)这种,还是会发生精度丢失的问题。

源码说明:

/* The results of this constructor can be somewhat unpredictable. * One might assume that writing {@codenew BigDecimal(0.1)} in * Java creates a {@code BigDecimal} which is exactly equal to * 0.1 (an unscaled value of 1, with a scale of 1), but it is * actually equal to * 0.1000000000000000055511151231257827021181583404541015625. * This is because 0.1 cannot be represented exactly as a * {@codedouble} (or, for that matter, as a binary fraction of * any finite length). Thus, the value that is being passed * <i>in</i> to the constructor is not exactly equal to 0.1, * appearances notwithstanding. …… * When a {@codedouble} must be used as a source for a * {@code BigDecimal}, note that this constructor provides an * exact conversion; it does not give the same result as * converting the {@codedouble} to a {@code String} using the * {@link Double#toString(double)} method and then using the * {@link #BigDecimal(String)} constructor. To get that result, * use the {@codestatic} {@link #valueOf(double)} method. * </ol> */ public BigDecimal(double val) { this(val,MathContext.UNLIMITED); }

例子:

import java.math.BigDecimal; public class Main { public static void main(String[] args) { System.out.println("Hello, World!"); BigDecimal a = new BigDecimal(1.01); BigDecimal b = new BigDecimal(1.02); BigDecimal c = new BigDecimal("1.01"); BigDecimal d = new BigDecimal("1.02"); System.out.println(a.add(b)); // 2.0300000000000000266453525910037569701671600341796875 System.out.println(c.add(d)); // 2.03 } }
This post is licensed under CC BY 4.0 by the author.
Share