将双精度浮点数转换为浮点而不依赖FPU舍入模式

有没有人有方便的将代码片段转换为直接下级（或高级）float,，而不改变或假设任何有关FPU当前舍入模式？将双精度浮点数转换为浮点而不依赖FPU舍入模式

注意：这个约束可能意味着根本不使用FPU。我期望在这些条件下最简单的方法就是读取64位长的双精度数据位并使用它。

你可以假设你选择了简单的存储方式，而且这一双，可通过下面的联盟d领域：

union double_bits 
{ 
    long i; 
    double d; 
};

我会尽力做我自己，但我敢肯定我会为非规范化或负数引入难以察觉的错误。

来源

2010-01-06 Pascal Cuoq

上glibc的系统，你会发现一个头文件ieee754.h，它定义了浮点类型和位域结构，工会，这样你就可以用尾数和指数更容易，不好意思工作，但我不能给你真正的码。 – quinmars 2010-01-06 11:23:45

我认为下面的作品，但我会先说明我的假设：

浮点数以IEEE-754格式存储在您的实施中，
没有溢出，
您有nextafterf()可用（它在C99中指定）。

此外，最有可能的是，这种方法效率不高。

#include <stdio.h> 
#include <stdlib.h> 
#include <math.h> 

int main(int argc, char *argv[]) 
{ 
    /* Change to non-zero for superior, otherwise inferior */ 
    int superior = 0; 

    /* double value to convert */ 
    double d = 0.1; 

    float f; 
    double tmp = d; 

    if (argc > 1) 
     d = strtod(argv[1], NULL); 

    /* First, get an approximation of the double value */ 
    f = d; 

    /* Now, convert that back to double */ 
    tmp = f; 

    /* Print the numbers. %a is C99 */ 
    printf("Double: %.20f (%a)\n", d, d); 
    printf("Float: %.20f (%a)\n", f, f); 
    printf("tmp: %.20f (%a)\n", tmp, tmp); 

    if (superior) { 
     /* If we wanted superior, and got a smaller value, 
      get the next value */ 
     if (tmp < d) 
      f = nextafterf(f, INFINITY); 
    } else { 
     if (tmp > d) 
      f = nextafterf(f, -INFINITY); 
    } 
    printf("converted: %.20f (%a)\n", f, f); 

    return 0; 
}

在我的机器，它打印：

Double: 0.10000000000000000555 (0x1.999999999999ap-4) 
Float: 0.10000000149011611938 (0x1.99999ap-4) 
tmp: 0.10000000149011611938 (0x1.99999ap-4) 
converted: 0.09999999403953552246 (0x1.999998p-4)

的想法是，我的double值转换为float值—这可能比这取决于双值小于或大于舍入模式。当转换回double时，我们可以检查它是小于还是大于原始值。然后，如果float的值不是正确的方向，我们查看下一个float号码的转换号码在原始号码的方向。

来源

2010-01-07 03:07:04

非常感谢你的代码。我慢慢地确信这是最容易出错的解决方案。感谢您指出'nextafterf'，这比在/ float中减少float的位好得多，就好像它是int一样。为了减轻f + 1等于f的风险，我可以写'nextafterf（f，INFINITY）'吗？ – 2010-01-07 08:46:54

我刚刚阅读手册页，C标准草案，并试用了它，看起来像'INFINITY'应该可以工作。 – 2010-01-07 08:54:39

好的，我编辑了我的帖子。感谢您的评论。 – 2010-01-07 08:56:58

为了更准确地不仅仅是重新结合尾数和做好这项工作指数位的检查了这一点：

http://www.mathworks.com/matlabcentral/fileexchange/23173

问候

来源

2010-01-06 10:01:20 stacker

谢谢。这里的'doubles2halfp'函数像我担心的那样复杂，但至少它已经有一半的常量是对的，所以这是一个很好的起点。 – 2010-01-06 10:14:13

我会使用给定的代码作为参考，并重写一个更简单的方法，使用＆>>，或者，然后检查非常小和非常大的数字。从http://babbage.cs.qc.edu/IEEE-754/Decimal查看移位计数和位位置。html – stacker 2010-01-06 10:30:27

我在这里发布了代码：https://stackoverflow.com/q/19644895/364818并在下面复制它以方便您。

// d is IEEE double, but double is not natively supported. 
    static float ConvertDoubleToFloat(void* d) 
    { 
     unsigned long long x; 
     float f; // assumed to be IEEE float 
     unsigned long long sign ; 
     unsigned long long exponent; 
     unsigned long long mantissa; 

     memcpy(&x,d,8); 

     // IEEE binary64 format (unsupported) 
     sign  = (x >> 63) & 1; // 1 
     exponent = ((x >> 52) & 0x7FF); // 11 
     mantissa = (x >> 0) & 0x000FFFFFFFFFFFFFULL; // 52 
     exponent -= 1023; 

     // IEEE binary32 format (supported) 
     exponent += 127; // rebase 
     exponent &= 0xFF; 
     mantissa >>= (52-23); // left justify 

     x = mantissa | (exponent << 23) | (sign << 31); 
     memcpy(&f,&x,4); 

     return f; 
    }

来源

2013-10-28 20:58:59

谢谢。 'exponent＆= 0xFF'这一行表示当返回'±FLT_MAX'或'±inf'时，会返回一个具有奇指数的'float'（反常结果也是关闭的）。 – 2013-10-28 21:13:50

将双精度浮点数转换为浮点而不依赖FPU舍入模式

回答

相关问题