CPP Floating-point Precision Issue

Posted May 31, 2020

By gerryyang

8 min read

浮点数存储格式
浮点数计算精度问题
- C++
- PHP
- Java

浮点数存储格式

浮点数的存储格式可参考：Single-precision floating-point format 32 bits，Double-precision floating-point format 64 bits

单精度浮点型float，通常32位，至少有6位有效数字，取值范围10^-38 - 10^38
双精度浮点型double，通常64位，15-17位有效数字，取值范围10^-308 - 10^308
多精度浮点型long double，精度更高
A signed 32-bit integer variable has a maximum value of 2^31 − 1 = 2,147,483,647; An IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2^−23) × 2^127 ≈ 3.4028235 × 10^38

浮点数计算精度问题

C++

The portable way to get epsilon in C++ is:

        
#include <limits>
std::numeric_limits<double>::epsilon()

Then the comparison function becomes:

        
      
#include <cmath>
#include <limits>

bool AreSame(double a, double b) {
    return std::fabs(a - b) < std::numeric_limits<double>::epsilon();
}

在C/C++中：

        
      
double a = 12.03;
double b = 22;
long long c = a * b * 100000000L;
printf("c[%lld]\n", c);              // 26465999999
c = a * 100000000L * b;
printf("c[%lld]\n", c);              // 26466000000

亦或在python中：

Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1.1 + 0.1
1.2000000000000002

Actually, the error is because there is no way to map 0.1 to a finite binary floating point number.
Most fractions can’t be converted to a decimal with exact precision. A good explanation is here: Floating Point Arithmetic: Issues and Limitations

What can I do to avoid this problem? That depends on what kind of calculations you’re doing.

If you really need your results to add up exactly, especially when you work with money: use a special decimal datatype.

If you just don’t want to see all those extra decimal places: simply format your result rounded to a fixed number of decimal places when displaying it.

If you have no decimal datatype available, an alternative is to work with integers, e.g. do money calculations entirely in cents. But this is more work and has some drawbacks.

refer:

在C/C++中的一些解决方案：

If you are looking for data type supporting money / currency then try this: decimal_for_cpp

boost - cpp_dec_float

The cpp_dec_float back-end is used in conjunction with number: It acts as an entirely C++ (header only and dependency free) floating-point number type that is a drop-in replacement for the native C++ floating-point types, but with much greater precision.

        
      
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_dec_float.hpp>

int main()
{
    namespace mp = boost::multiprecision;
    // here I'm using a predefined type that stores 100 digits,
    // but you can create custom types very easily with any level
    // of precision you want.
    typedef mp::cpp_dec_float_100 decimal;

    decimal tiny("0.0000000000000000000000000000000000000000000001");
    decimal huge("100000000000000000000000000000000000000000000000");
    decimal a = tiny;

    while (a != huge)
    {
        std::cout.precision(100);
        std::cout << std::fixed << a << '\n';
        a *= 10;
    }

    // (10000000000 - 5000000000) * 2.01 = 10049999999.999998
    cpp_dec_float_50 a(std::to_string(2.01));
    cpp_dec_float_50 b = ((10000000000 - 5000000000)) * a;
    long long c = b.convert_to<long long>();

}

PHP

浮点数的精度有限。尽管取决于系统，PHP 通常使用 IEEE 754 双精度格式，则由于取整而导致的最大相对误差为 1.11e-16。非基本数学运算可能会给出更大误差，并且要考虑到进行复合运算时的误差传递。此外，以十进制能够精确表示的有理数如 0.1 或 0.7，无论有多少尾数都不能被内部所使用的二进制精确表示，因此不能在不丢失一点点精度的情况下转换为二进制的格式。这就会造成混乱的结果：例如，floor((0.1+0.7)*10) 通常会返回7而不是预期中的8，因为该结果内部的表示其实是类似7.9999999999999991118...。

所以永远不要相信浮点数结果精确到了最后一位，也永远不要比较两个浮点数是否相等。如果确实需要更高的精度，应该使用任意精度数学函数或者gmp函数。

浮点数的字长和平台相关，尽管通常最大值是 1.8e308 并具有 14 位十进制数字的精度（64 位 IEEE 格式）。

浮点数的形式表示：

LNUM          [0-9]+
DNUM          ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*)
EXPONENT_DNUM [+-]?(({LNUM} | {DNUM}) [eE][+-]? {LNUM})

例如：

1.234
1.2e3;
7E-10

php - Float浮点型

Java

Java中float的精度为6-7位有效数字。double的精度为15-16位。在Java中，通常用到金钱计算的地方要用BigDecimal，因为正常的浮点数计算会出现精度丢失的问题。

        
      
System.out.println(0.05 + 0.01);  // 0.060000000000000005
System.out.println(1.0 - 0.42);   // 0.5800000000000001
System.out.println(4.015 * 100);  // 401.49999999999994
System.out.println(123.3 / 100);  // 1.2329999999999999

BigDecimal使用方法：

        
      
// 构造函数
BigDecimal(int);       // 创建一个具有参数，所指定整数值的对象
BigDecimal(double);    // 创建一个具有参数，所指定双精度值的对象
BigDecimal(long);      // 创建一个具有参数，所指定长整数值的对象
BigDecimal(String);    // 创建一个具有参数，所指定以字符串表示的数值的对象

// 方法
add(BigDecimal);       // BigDecimal对象中的值相加，然后返回这个对象
subtract(BigDecimal);  // BigDecimal对象中的值相减，然后返回这个对象
multiply(BigDecimal);  // BigDecimal对象中的值相乘，然后返回这个对象
divide(BigDecimal);    // BigDecimal对象中的值相除，然后返回这个对象
toString();            // 将BigDecimal对象的数值转换成字符串
doubleValue();         // 将BigDecimal对象中的值以双精度数返回
floatValue();          // 将BigDecimal对象中的值以单精度数返回
longValue();           // 将BigDecimal对象中的值以长整数返回
intValue();            // 将BigDecimal对象中的值以整数返回

注意，在使用BigDecimal时，使用它的BigDecimal(String)构造器创建对象才有意义。其他的如BigDecimal b = new BigDecimal(1)这种，还是会发生精度丢失的问题。

源码说明：

        
      
    /* The results of this constructor can be somewhat unpredictable.
     * One might assume that writing {@codenew BigDecimal(0.1)} in
     * Java creates a {@code BigDecimal} which is exactly equal to
     * 0.1 (an unscaled value of 1, with a scale of 1), but it is
     * actually equal to
     * 0.1000000000000000055511151231257827021181583404541015625.
     * This is because 0.1 cannot be represented exactly as a
     * {@codedouble} (or, for that matter, as a binary fraction of
     * any finite length).  Thus, the value that is being passed
     * <i>in</i> to the constructor is not exactly equal to 0.1,
     * appearances notwithstanding.
       ……
        * When a {@codedouble} must be used as a source for a
     * {@code BigDecimal}, note that this constructor provides an
     * exact conversion; it does not give the same result as
     * converting the {@codedouble} to a {@code String} using the
     * {@link Double#toString(double)} method and then using the
     * {@link #BigDecimal(String)} constructor.  To get that result,
     * use the {@codestatic} {@link #valueOf(double)} method.
     * </ol>
     */
public BigDecimal(double val) {
    this(val,MathContext.UNLIMITED);
}

例子：

        
      
import java.math.BigDecimal;

public class Main {
        public static void main(String[] args) {
                System.out.println("Hello, World!");

                BigDecimal a = new BigDecimal(1.01);
                BigDecimal b = new BigDecimal(1.02);
                BigDecimal c = new BigDecimal("1.01");
                BigDecimal d = new BigDecimal("1.02");
                System.out.println(a.add(b)); // 2.0300000000000000266453525910037569701671600341796875
                System.out.println(c.add(d)); // 2.03
        }
}

BigDecimal一定不会丢失精度吗？

C/C++

This post is licensed under CC BY 4.0 by the author.

浮点数存储格式

浮点数计算精度问题

C++

PHP

Java

Trending Tags