2012-02-24 295 views
16

我想实现范围缩减作为实现正弦函数的第一步。范围缩减精度不高的单精度浮点数

我下面的论文中描述的方法"ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by K.C. NG

我使用x的输入范围从0到20000。我的错误,当收到错误一样大0.002339146显然不应该是大的,并且我我不知道如何减少它。我注意到误差幅度与输入θ幅度与余弦/正弦相关。

我能够获得纸张提到nearpi.c代码,但我不知道如何使用的单精度浮点代码。如果有人有兴趣,在nearpi.c文件可以在此链接中找到:nearpi.c

这里是我的MATLAB代码:

x = 0:0.1:20000; 

% Perform range reduction 
% Store constant 2/pi 
twooverpi = single(2/pi); 

% Compute y 
y = (x.*twooverpi); 

% Compute k (round to nearest integer 
k = round(y); 

% Solve for f 
f = single(y-k); 

% Solve for r 
r = single(f*single(pi/2)); 

% Find last two bits of k 
n = bitand(fi(k,1,32,0),fi(3,1,32,0)); 
n = single(n); 

% Preallocate for speed 
z(length(x)) = 0; 
for i = 1:length(x) 

    switch(n(i)) 
     case 0 
      z(i)=sin(r(i)); 
     case 1 
      z(i) = single(cos(r(i))); 
     case 2 
      z(i) = -sin(r(i)); 
     case 3 
      z(i) = single(-cos(r(i))); 
     otherwise 
    end 

end 

maxerror = max(abs(single(z - single(sin(single(x)))))) 
minerror = min(abs(single(z - single(sin(single(x)))))) 

我已编辑的程序nearpi.c所以它编译。但是我不知道如何解释输出。此外,文件需要一个输入,我不得不手动输入,我也不确定输入的重要性。

这里是工作nearpi.c:

/* 
============================================================================ 
Name  : nearpi.c 
Author  : 
Version  : 
Copyright : Your copyright notice 
Description : Hello World in C, Ansi-style 
============================================================================ 
*/ 

#include <stdio.h> 
#include <stdlib.h> 
#include <math.h> 


/* 
* Global macro definitions. 
*/ 

# define hex(double) *(1 + ((long *) &double)), *((long *) &double) 
# define sgn(a)   (a >= 0 ? 1 : -1) 
# define MAX_k   2500 
# define D    56 
# define MAX_EXP  127 
# define THRESHOLD  2.22e-16 

/* 
* Global Variables 
*/ 

int  CFlength,    /* length of CF including terminator */ 
     binade; 
double e, 
     f;      /* [e,f] range of D-bit unsigned int of f; 
            form 1X...X */ 

// Function Prototypes 
int dbleCF (double i[], double j[]); 
void input (double i[]); 
void nearPiOver2 (double i[]); 


/* 
* This is the start of the main program. 
*/ 

int main (void) 
{ 
    int  k;     /* subscript variable */ 
    double i[MAX_k], 
      j[MAX_k];   /* i and j are continued fractions 
            (coeffs) */ 


    // fp = fopen("/src/cfpi.txt", "r"); 


/* 
* Compute global variables e and f, where 
* 
*  e = 2^(D-1), i.e. the D bit number 10...0 
* and 
*  f = 2^D - 1, i.e. the D bit number 11...1 . 
*/ 

    e = 1; 
    for (k = 2; k <= D; k = k + 1) 
     e = 2 * e; 
    f = 2 * e - 1; 

/* 
    * Compute the continued fraction for (2/e)/(pi/2) , i.e. 
    * q's starting value for the first binade, given the continued 
    * fraction for pi as input; set the global variable CFlength 
    * to the length of the resulting continued fraction (including 
    * its negative valued terminator). One should use as many 
    * partial coefficients of pi as necessary to resolve numbers 
    * of the width of the underflow plus the overflow threshold. 
    * A rule of thumb is 0.97 partial coefficients are generated 
    * for every decimal digit of pi . 
    * 
    * Note: for radix B machines, subroutine input should compute 
    * the continued fraction for (B/e)/(pi/2) where e = B^(D - 1). 
    */ 

    input (i); 

/* 
* Begin main loop over all binades: 
* For each binade, find the nearest multiples of pi/2 in that binade. 
* 
* [ Note: for hexadecimal machines (B = 16), the rest of the main 
* program simplifies(!) to 
* 
*      B_ade = 1; 
*      while (B_ade < MAX_EXP) 
*      { 
*       dbleCF (i, j); 
*       dbleCF (j, i); 
*       dbleCF (i, j); 
*       CFlength = dbleCF (j, i); 
*       B_ade = B_ade + 1; 
*      } 
*     } 
* 
* because the alternation of source & destination are no longer necessary. ] 
*/ 

    binade = 1; 
    while (binade < MAX_EXP) 
    { 

/* 
* For the current (odd) binade, find the nearest multiples of pi/2. 
*/ 

     nearPiOver2 (i); 

/* 
* Double the continued fraction to get to the next (even) binade. 
* To save copying arrays, i and j will alternate as the source 
* and destination for the continued fractions. 
*/ 

     CFlength = dbleCF (i, j); 
     binade = binade + 1; 

/* 
* Check for main loop termination again because of the 
* alternation. 
*/ 

     if (binade >= MAX_EXP) 
      break; 

/* 
* For the current (even) binade, find the nearest multiples of pi/2. 
*/ 

     nearPiOver2 (j); 

/* 
* Double the continued fraction to get to the next (odd) binade. 
*/ 

     CFlength = dbleCF (j, i); 
     binade = binade + 1; 
    } 

    return 0; 
}        /* end of Main Program */ 

/* 
* Subroutine DbleCF doubles a continued fraction whose partial 
* coefficients are i[] into a continued fraction j[], where both 
* arrays are of a type sufficient to do D-bit integer arithmetic. 
* 
* In my case (D = 56) , I am forced to treat integers as double 
* precision reals because my machine does not have integers of 
* sufficient width to handle D-bit integer arithmetic. 
* 
* Adapted from a Basic program written by W. Kahan. 
* 
* Algorithm based on Hurwitz's method of doubling continued 
* fractions (see Knuth Vol. 3, p.360). 
* 
* A negative value terminates the last partial quotient. 
* 
* Note: for the non-C programmers, the statement break 
* exits a loop and the statement continue skips to the next 
* case in the same loop. 
* 
* The call modf (l/2, &l0) assigns the integer portion of 
* half of L to L0. 
*/ 

int dbleCF (double i[], double j[]) 
{ 
     double k, 
        l, 
        l0, 
        j0; 
     int n, 
        m; 
    n = 1; 
    m = 0; 
    j0 = i[0] + i[0]; 
    l = i[n]; 
    while (1) 
    { 
     if (l < 0) 
     { 
      j[m] = j0; 
      break; 
     }; 
     modf (l/2, &l0); 
     l = l - l0 - l0; 
     k = i[n + 1]; 
     if (l0 > 0) 
     { 
      j[m] = j0; 
      j[m + 1] = l0; 
      j0 = 0; 
      m = m + 2; 
     }; 
     if (l == 0) { 
/* 
* Even case. 
*/ 
      if (k < 0) 
      { 
       m = m - 1; 
       break; 
      } 
      else 
      { 
       j0 = j0 + k + k; 
       n = n + 2; 
       l = i[n]; 
       continue; 
      }; 
     } 
/* 
* Odd case. 
*/ 
     if (k < 0) 
     { 
      j[m] = j0 + 2; 
      break; 
     }; 
     if (k == 0) 
     { 
      n = n + 2; 
      l = l + i[n]; 
      continue; 
     }; 
     j[m] = j0 + 1; 
     m = m + 1; 
     j0 = 1; 
     l = k - 1; 
     n = n + 1; 
     continue; 
    }; 
    m = m + 1; 
    j[m] = -99999; 
    return (m); 
} 

/* 
* Subroutine input computes the continued fraction for 
* (2/e)/(pi/2) , where e = 2^(D-1) , given pi 's 
* continued fraction as input. That is, double the continued 
* fraction of pi D-3 times and place a zero at the front. 
* 
* One should use as many partial coefficients of pi as 
* necessary to resolve numbers of the width of the underflow 
* plus the overflow threshold. A rule of thumb is 0.97 
* partial coefficients are generated for every decimal digit 
* of pi . The last coefficient of pi is terminated by a 
* negative number. 
* 
* I'll be happy to supply anyone with the partial coefficients 
* of pi . My ARPA address is [email protected] . 
* 
* I computed the partial coefficients of pi using a method of 
* Bill Gosper's. I need only compute with integers, albeit 
* large ones. After writing the program in bc and Vaxima , 
* Prof. Fateman suggested FranzLisp . To my surprise, FranzLisp 
* ran the fastest! the reason? FranzLisp's Bignum package is 
* hand coded in assembler. Also, FranzLisp can be compiled. 
* 
* 
* Note: for radix B machines, subroutine input should compute 
* the continued fraction for (B/e)/(pi/2) where e = B^(D - 1). 
* In the case of hexadecimal (B = 16), this is done by repeated 
* doubling the appropriate number of times. 
*/ 

void input (double i[]) 
{ 
    int  k; 
    double j[MAX_k]; 

/* 
* Read in the partial coefficients of pi from a precalculated file 
* until a negative value is encountered. 
*/ 

    k = -1; 
    do 
    { 
     k = k + 1; 
     scanf ("%lE", &i[k]); 
     printf("hello\n"); 
     printf("%d", k); 
    } while (i[k] >= 0); 

/* 
* Double the continued fraction for pi D-3 times using 
* i and j alternately as source and destination. On my 
* machine D = 56 so D-3 is odd; hence the following code: 
* 
* Double twice (D-3)/2 times, 
*/ 
    for (k = 1; k <= (D - 3)/2; k = k + 1) 
    { 
     dbleCF (i, j); 
     dbleCF (j, i); 
    }; 
/* 
* then double once more. 
*/ 
    dbleCF (i, j); 

/* 
* Now append a zero on the front (reciprocate the continued 
* fraction) and the return the coefficients in i . 
*/ 

    i[0] = 0; 
    k = -1; 
    do 
    { 
     k = k + 1; 
     i[k + 1] = j[k]; 
    } while (j[k] >= 0); 

/* 
* Return the length of the continued fraction, including its 
* terminator and initial zero, in the global variable CFlength. 
*/ 

    CFlength = k; 
} 

/* 
* Given a continued fraction's coefficients in an array i , 
* subroutine nearPiOver2 finds all machine representable 
* values near a integer multiple of pi/2 in the current binade. 
*/ 

void nearPiOver2 (double i[]) 
{ 
    int  k,     /* subscript for recurrences (see 
            handout) */ 
      K;     /* like k , but used during cancel. elim. 
            */ 
    double p[MAX_k],   /* product of the q's   (see 
            handout) */ 
      q[MAX_k],   /* successive tail evals of CF (see 
            handout) */ 
      j[MAX_k],   /* like convergent numerators (see 
            handout) */ 
      tmp,    /* temporary used during cancellation 
            elim. */ 
      mk0,    /* m[k - 1]      (see 
            handout) */ 
      mk,     /* m[k] is one of the few ints (see 
            handout) */ 
      mkAbs,    /* absolute value of m sub k 
           */ 
      mK0,    /* like mk0 , but used during cancel. 
            elim. */ 
      mK,     /* like mk , but used during cancel. 
            elim. */ 
      z,     /* the object of our quest (the argument) 
           */ 
      m0,     /* the mantissa of z as a D-bit integer 
           */ 
      x,     /* the reduced argument   (see 
            handout) */ 
      ldexp(),   /* sys routine to multiply by a power of 
            two */ 
      fabs(),   /* sys routine to compute FP absolute 
            value */ 
      floor(),   /* sys routine to compute greatest int <= 
            value */ 
      ceil();   /* sys routine to compute least int >= 
            value */ 

/* 
    * Compute the q's by evaluating the continued fraction from 
    * bottom up. 
    * 
    * Start evaluation with a big number in the terminator position. 
    */ 

    q[CFlength] = 1.0 + 30; 

    for (k = CFlength - 1; k >= 0; k = k - 1) 
     q[k] = i[k] + 1/q[k + 1]; 

/* 
* Let THRESHOLD be the biggest | x | that we are interesed in 
* seeing. 
* 
* Compute the p's and j's by the recurrences from the top down. 
* 
* Stop when 
* 
*  1       1 
*  ----- >= THRESHOLD > ------ . 
*  2 |j |      2 |j | 
*   k       k+1 
*/ 

    p[0] = 1; 
    j[0] = 0; 
    j[1] = 1; 
    k = 0; 
    do 
    { 
     p[k + 1] = -q[k + 1] * p[k]; 
     if (k > 0) 
      j[1 + k] = j[k - 1] - i[k] * j[k]; 
     k = k + 1; 
    } while (1/(2 * fabs (j[k])) >= THRESHOLD); 

/* 
* Then mk runs through the integers between 
* 
*     k  +     k  + 
*    (-1) e/p - 1/2  & (-1) f/p - 1/2 . 
*       k       k 
*/ 

    for (mkAbs = floor (e/fabs (p[k])); 
      mkAbs <= ceil (f/fabs (p[k])); mkAbs = mkAbs + 1) 
    { 

     mk = mkAbs * sgn (p[k]); 

/* 
* For each mk , mk0 runs through integers between 
* 
*     + 
*    m q - p THRESHOLD . 
*    k k  k 
*/ 

     for (mk0 = floor (mk * q[k] - fabs (p[k]) * THRESHOLD); 
       mk0 <= ceil (mk * q[k] + fabs (p[k]) * THRESHOLD); 
       mk0 = mk0 + 1) 
     { 

/* 
* For each pair { mk , mk0 } , check that 
* 
*        k 
*    m  = (-1) (j m - j m ) 
*    0     k-1 k k k-1 
*/ 
      m0 = (k & 1 ? -1 : 1) * (j[k - 1] * mk - j[k] * mk0); 

/* 
* lies between e and f . 
*/ 
      if (e <= fabs (m0) && fabs (m0) <= f) 
      { 

/* 
* If so, then we have found an 
* 
*        k 
*    x  = ((-1) m/p - m)/j 
*         0 k k  k 
* 
*      = (m q - m )/p . 
*       k k k-1  k 
* 
* But this later formula can suffer cancellation. Therefore, 
* run the recurrence for the mk 's to get mK with minimal 
* | mK | + | mK0 | in the hope mK is 0 . 
*/ 
       K = k; 
       mK = mk; 
       mK0 = mk0; 
       while (fabs (mK) > 0) 
       { 
        p[K + 1] = -q[K + 1] * p[K]; 
        tmp = mK0 - i[K] * mK; 
        if (fabs (tmp) > fabs (mK0)) 
         break; 
        mK0 = mK; 
        mK = tmp; 
        K = K + 1; 
       }; 

/* 
* Then 
*    x  = (m q - m )/p 
*       K K K-1  K 
* 
* as accurately as one could hope. 
*/ 
       x = (mK * q[K] - mK0)/p[K]; 

/* 
* To return z and m0 as positive numbers, 
* x must take the sign of m0 . 
*/ 
       x = x * sgn (m0); 
       m0 = fabs (m0); 

/*d 
* Set z = m0 * 2^(binade+1-D) . 
*/ 
       z = ldexp (m0, binade + 1 - D); 

/* 
* Print z (hex), z (dec), m0 (dec), binade+1-D, x (hex), x (dec). 
*/ 

       printf ("%08lx %08lx Z=%22.16E M=%17.17G L+1-%d=%3d %08lx %08lx x=%23.16E\n", hex (z), z, m0, D, binade + 1 - D, hex (x), x); 

      } 
     } 
    } 
} 
+0

对于trig函数,IIRC准确的范围缩小对于pi和其余部分都需要很高的精度;通常在数学库中使用数百位。 – janneb 2012-02-24 09:53:17

+3

+1 - 有趣的文章(只是有时间浏览,稍后会仔细阅读) – 2012-02-28 00:55:18

+0

@starbox:我读过,如果之前,是的。我不熟悉用matlab做高精度算术运算,所以我不会建议你如何去做。 – janneb 2012-02-29 21:11:34

回答

9

理论

首先,让我们用单精度运算,使注意区别。

  1. [等式8] f的最小值可以更大。作为双精度数是一超集中的单精度数的,最接近single到的2/pi倍数只能更远然后〜2.98e-19,因此前导零的的f固定算术表示的数量必须在大多数61个前导零(但可能会更少)。表示数量为fdigits

  2. [等式之前9]因此,代替121位,y必须精确到fdigits + 24(非零显著在单精度位)+ 7(额外保护位)= fdigits + 31,并且在最92.

  3. [等式9]“,因此,与x的指数的宽度一起,2/pi必须包含127(的single最大指数)+ 31 + fdigits,或158 + fdigits和至多219个比特。

  4. [Subsec和灰2.5]的A大小由零的数目在x二进制小数点之前确定(并且不受移动到single),而C大小由公式确定之前9.

    • 对于大xx> = 2^24),x看起来像这样:[24位,M零。将其乘以A(其大小是2/pi的第一个M位)将导致整数(零的x将只将所有内容都转换为整数)。
    • 选择C将从M+d2/pi位开始将导致产品x*C的尺寸最大为d-24。以双精度,d被选择为174(而不是24,我们有53),以便产品的尺寸最大为121.在single中,选择d使得d-24 <= 92或更确切地说d-24 <= fdigits+31 。也就是说,d可以选择为fdigits +55或至多116.
    • 结果,B应该具有至多116位的大小。

我们因此留下了两个问题:

  1. 计算fdigits。这涉及阅读链接文件中的参考文献6并理解它。可能不那么容易。 :)据我所知,这是唯一使用nearpi.c的地方。
  2. 计算B,相关位2/pi。由于M的范围小于127,所以我们只需计算2/pi的前127 + 116位,并将它们存储在一个数组中。见Wikipedia
  3. 计算y=x*B。这包括将x乘以116位数字。这是使用第3节的地方。块的大小被选为24,因为2 * 24 + 2(乘以两个24位数,并且加上3个这样的数)小于double,53(因为24除96)的精度。对于single算术,我们可以使用大小为11位的块,原因相似。

注 - B的技巧仅适用于指数为正数(x> = 2^24)的数字。

总结 - 首先,您必须用double精度解决问题。你的Matlab代码也不能在double的精确度下工作(请尝试删除single和计算sin(2^53),因为你的twooverpi只有53个有效位,而不是175(无论如何,你不能直接在Matlab中乘这样精确的数字)。方案应该适应与single一起工作,并且关键问题是精确地代表2/pi,并支持高精度数字的乘法。最后,当一切正常时,您可以尝试找出更好的fdigits以减少位你要存储和繁殖

希望我没有完全关闭 - 意见和矛盾,欢迎

作为一个例子,让我们计算sin(x)其中x = single(2^24-1),其具有显著位(M = 0)后没有零点。这简化了查找B,因为B2/pi的前116位组成。由于x具有24位和116位B,产品精度

y = x * B 

将具有精度92位,如必需的。

链接论文中的第3部分描述了如何以足够的精度执行此产品;在我们的例子中,相同的算法可以用于大小为11的块来计算y。作为苦差事,我希望我可以原谅,因为没有明确地这样做,而是依靠Matlab的符号数学工具箱。这个工具箱为我们提供了vpa函数,它允许我们用十进制数字来指定一个数字的精度。所以,

vpa('2/pi', ceil(116*log10(2))) 

将产生的精度至少为116个比特的2/pi的近似。因为vpa只接受整数作为其精度参数,所以我们通常不能精确地指定数字的二进制精度,因此我们使用次最佳值。

下面的代码计算sin(x)根据纸张,在single精度:

x = single(2^24-1); 
y = x * vpa('2/pi', ceil(116*log10(2))); % Precision = 103.075 
k = round(y); 
f = single(y - k); 
r = f * single(pi)/2; 
switch mod(k, 4) 
    case 0 
     s = sin(r); 
    case 1 
     s = cos(r); 
    case 2 
     s = -sin(r); 
    case 3 
     s = -cos(r); 
end 
sin(x) - s         % Expected value: exactly zero. 

(使用Mathematica获得的y精度,其结果是比Matlab一个更好的数值工具:))

libm

对方回答这个问题(因为它已被删除),使我的implem在libm,虽然工作在双精度数字,但非常彻底地遵循链接的文件。

参见文件s_sin.c为包装(从链接的纸张表2显示为在该文件的末尾switch语句),并e_rem_pio2.c对于参数减少代码(特别感兴趣的是包含第一个396六角形阵列数字2/pi,从第69行开始)。

+0

我明白有问题需要解决。但是我仍然没有根据你写的内容看到解决方案。你能填好你留下的空白吗?感谢您的时间。 – Veridian 2012-03-03 21:44:44

+0

增加了一些说明。 – user1071136 2012-03-04 21:26:18

+0

我的输入可能总是在+/- 20000范围内(小输入也会包括在内),但是我的输入x总是远远小于2^24。 – Veridian 2012-03-05 19:06:50