浮点数存储格式

浮点数的存储格式可参考:Single-precision floating-point format 32 bitsDouble-precision floating-point format 64 bits

  • 单精度浮点型float,通常32位,至少有6位有效数字,取值范围10^-38 - 10^38
  • 双精度浮点型double,通常64位,15-17位有效数字,取值范围10^-308 - 10^308
  • 多精度浮点型long double,精度更高
  • A signed 32-bit integer variable has a maximum value of 2^31 − 1 = 2,147,483,647; An IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2^−23) × 2^127 ≈ 3.4028235 × 10^38

浮点数计算精度问题

C++

The portable way to get epsilon in C++ is:

#include <limits>
std::numeric_limits<double>::epsilon()

Then the comparison function becomes:

#include <cmath>
#include <limits>

bool AreSame(double a, double b) {
    return std::fabs(a - b) < std::numeric_limits<double>::epsilon();
}

C/C++中:

double a = 12.03;
double b = 22; 
long long c = a * b * 100000000L;
printf("c[%lld]\n", c);              // 26465999999
c = a * 100000000L * b;
printf("c[%lld]\n", c);              // 26466000000

亦或在python中:

Python 2.7.5 (default, Jun 17 2014, 18:11:42) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1.1 + 0.1
1.2000000000000002
  • Actually, the error is because there is no way to map 0.1 to a finite binary floating point number.
  • Most fractions can’t be converted to a decimal with exact precision. A good explanation is here: Floating Point Arithmetic: Issues and Limitations

What can I do to avoid this problem? That depends on what kind of calculations you’re doing.

  • If you really need your results to add up exactly, especially when you work with money: use a special decimal datatype.
  • If you just don’t want to see all those extra decimal places: simply format your result rounded to a fixed number of decimal places when displaying it.
  • If you have no decimal datatype available, an alternative is to work with integers, e.g. do money calculations entirely in cents. But this is more work and has some drawbacks.

refer:

C/C++中的一些解决方案:

If you are looking for data type supporting money / currency then try this: decimal_for_cpp

The cpp_dec_float back-end is used in conjunction with number: It acts as an entirely C++ (header only and dependency free) floating-point number type that is a drop-in replacement for the native C++ floating-point types, but with much greater precision.

#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_dec_float.hpp>

int main()
{
    namespace mp = boost::multiprecision;
    // here I'm using a predefined type that stores 100 digits,
    // but you can create custom types very easily with any level
    // of precision you want.
    typedef mp::cpp_dec_float_100 decimal;

    decimal tiny("0.0000000000000000000000000000000000000000000001");
    decimal huge("100000000000000000000000000000000000000000000000");
    decimal a = tiny;         

    while (a != huge)
    {
        std::cout.precision(100);
        std::cout << std::fixed << a << '\n';
        a *= 10;
    }

    // (10000000000 - 5000000000) * 2.01 = 10049999999.999998
    cpp_dec_float_50 a(std::to_string(2.01));
    cpp_dec_float_50 b = ((10000000000 - 5000000000)) * a;
    long long c = b.convert_to<long long>();

}

PHP

浮点数的精度有限。尽管取决于系统,PHP 通常使用 IEEE 754 双精度格式,则由于取整而导致的最大相对误差为 1.11e-16。非基本数学运算可能会给出更大误差,并且要考虑到进行复合运算时的误差传递。此外,以十进制能够精确表示的有理数如 0.10.7,无论有多少尾数都不能被内部所使用的二进制精确表示,因此不能在不丢失一点点精度的情况下转换为二进制的格式。这就会造成混乱的结果:例如,floor((0.1+0.7)*10) 通常会返回7而不是预期中的8,因为该结果内部的表示其实是类似7.9999999999999991118...

所以永远不要相信浮点数结果精确到了最后一位,也永远不要比较两个浮点数是否相等。如果确实需要更高的精度,应该使用任意精度数学函数或者gmp函数

浮点数的字长和平台相关,尽管通常最大值是 1.8e308 并具有 14 位十进制数字的精度(64 位 IEEE 格式)。

浮点数的形式表示:

LNUM          [0-9]+
DNUM          ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*)
EXPONENT_DNUM [+-]?(({LNUM} | {DNUM}) [eE][+-]? {LNUM})

例如:

1.234
1.2e3;
7E-10

Java

Java中float的精度为6-7位有效数字。double的精度为15-16位。在Java中,通常用到金钱计算的地方要用BigDecimal,因为正常的浮点数计算会出现精度丢失的问题。

System.out.println(0.05 + 0.01);  // 0.060000000000000005
System.out.println(1.0 - 0.42);   // 0.5800000000000001
System.out.println(4.015 * 100);  // 401.49999999999994
System.out.println(123.3 / 100);  // 1.2329999999999999

BigDecimal使用方法:

// 构造函数
BigDecimal(int);       // 创建一个具有参数,所指定整数值的对象
BigDecimal(double);    // 创建一个具有参数,所指定双精度值的对象
BigDecimal(long);      // 创建一个具有参数,所指定长整数值的对象
BigDecimal(String);    // 创建一个具有参数,所指定以字符串表示的数值的对象

// 方法                    
add(BigDecimal);       // BigDecimal对象中的值相加,然后返回这个对象
subtract(BigDecimal);  // BigDecimal对象中的值相减,然后返回这个对象
multiply(BigDecimal);  // BigDecimal对象中的值相乘,然后返回这个对象
divide(BigDecimal);    // BigDecimal对象中的值相除,然后返回这个对象
toString();            // 将BigDecimal对象的数值转换成字符串
doubleValue();         // 将BigDecimal对象中的值以双精度数返回
floatValue();          // 将BigDecimal对象中的值以单精度数返回
longValue();           // 将BigDecimal对象中的值以长整数返回
intValue();            // 将BigDecimal对象中的值以整数返回

注意,在使用BigDecimal时,使用它的BigDecimal(String)构造器创建对象才有意义。其他的如BigDecimal b = new BigDecimal(1)这种,还是会发生精度丢失的问题。

源码说明:

    /* The results of this constructor can be somewhat unpredictable.  
     * One might assume that writing {@codenew BigDecimal(0.1)} in  
     * Java creates a {@code BigDecimal} which is exactly equal to  
     * 0.1 (an unscaled value of 1, with a scale of 1), but it is  
     * actually equal to  
     * 0.1000000000000000055511151231257827021181583404541015625.  
     * This is because 0.1 cannot be represented exactly as a  
     * {@codedouble} (or, for that matter, as a binary fraction of  
     * any finite length).  Thus, the value that is being passed  
     * <i>in</i> to the constructor is not exactly equal to 0.1,  
     * appearances notwithstanding.  
       ……  
        * When a {@codedouble} must be used as a source for a  
     * {@code BigDecimal}, note that this constructor provides an  
     * exact conversion; it does not give the same result as  
     * converting the {@codedouble} to a {@code String} using the  
     * {@link Double#toString(double)} method and then using the  
     * {@link #BigDecimal(String)} constructor.  To get that result,  
     * use the {@codestatic} {@link #valueOf(double)} method.  
     * </ol>  
     */
public BigDecimal(double val) {  
    this(val,MathContext.UNLIMITED);  
}  

例子:

import java.math.BigDecimal;

public class Main {
        public static void main(String[] args) {
                System.out.println("Hello, World!");

                BigDecimal a = new BigDecimal(1.01);
                BigDecimal b = new BigDecimal(1.02);
                BigDecimal c = new BigDecimal("1.01");
                BigDecimal d = new BigDecimal("1.02");
                System.out.println(a.add(b)); // 2.0300000000000000266453525910037569701671600341796875
                System.out.println(c.add(d)); // 2.03
        }
}