在strlen实现中了解代码

我有两个问题关于glibc中string.h中strlen的实现。在strlen实现中了解代码

该实现使用带有“孔”的幻数。我无法理解这是如何工作的。有人可以帮我理解这个片段：

size_t 
strlen (const char *str) 
{ 
    const char *char_ptr; 
    const unsigned long int *longword_ptr; 
    unsigned long int longword, himagic, lomagic; 

    /* Handle the first few characters by reading one character at a time. 
     Do this until CHAR_PTR is aligned on a longword boundary. */ 
    for (char_ptr = str; ((unsigned long int) char_ptr 
      & (sizeof (longword) - 1)) != 0; 
     ++char_ptr) 
    if (*char_ptr == '\0') 
     return char_ptr - str; 

    /* All these elucidatory comments refer to 4-byte longwords, 
     but the theory applies equally well to 8-byte longwords. */ 

    longword_ptr = (unsigned long int *) char_ptr; 

    /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits 
     the "holes." Note that there is a hole just to the left of 
     each byte, with an extra at the end: 

     bits: 01111110 11111110 11111110 11111111 
     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD 

     The 1-bits make sure that carries propagate to the next 0-bit. 
     The 0-bits provide holes for carries to fall into. */ 

    himagic = 0x80808080L; 
     lomagic = 0x01010101L; 
     if (sizeof (longword) > 4) 
     { 
      /* 64-bit version of the magic. */ 
      /* Do the shift in two steps to avoid a warning if long has 32 bits. */ 
      himagic = ((himagic << 16) << 16) | himagic; 
      lomagic = ((lomagic << 16) << 16) | lomagic; 
     } 
     if (sizeof (longword) > 8) 
     abort(); 

     /* Instead of the traditional loop which tests each character, 
      we will test a longword at a time. The tricky part is testing 
      if *any of the four* bytes in the longword in question are zero. */ 
     for (;;) 
     { 
      longword = *longword_ptr++; 

      if (((longword - lomagic) & ~longword & himagic) != 0) 
     { 
      /* Which of the bytes was the zero? If none of them were, it was 
       a misfire; continue the search. */ 

      const char *cp = (const char *) (longword_ptr - 1); 

      if (cp[0] == 0) 
      return cp - str; 
      if (cp[1] == 0) 
      return cp - str + 1; 
      if (cp[2] == 0) 
      return cp - str + 2; 
      if (cp[3] == 0) 
      return cp - str + 3; 
      if (sizeof (longword) > 4) 
      { 
       if (cp[4] == 0) 
      return cp - str + 4; 
       if (cp[5] == 0) 
      return cp - str + 5; 
       if (cp[6] == 0) 
      return cp - str + 6; 
    if (cp[7] == 0) 
     return cp - str + 7; 
}}}

什么是幻数被用于？

为什么不直接递增指针直到NULL字符并返回计数？这种方法更快吗？为什么这样？

来源

2016-01-06 Neo

在大多数架构上，glibc将使用更快的功能。例如，在现代英特尔芯片上，它使用SIMD扩展来向量化检查。 – rici

这用于看4个字节（32位）或甚至8个（64位）在一气呵成，以检查是否其中之一是代替单独检查每个字节为零（字符串的结尾），。

下面是一个例子，检查空字节：

unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0 
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

对于一些更看到Bit Twiddling Hacks。

在此使用的一个（32位为例）：

还有另一个更快的方法 - 使用hasless（V，1），其被定义如下 ;它在4个操作中工作，并且不需要经过验证。它简化到

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

子表达式（V - 0x01010101UL），计算结果为高比特在设置任何字节每当v中相应的字节比 0x80的大于零或。子表达式〜v & 0x80808080UL评估为高字节集合，其中v的字节没有设置其高位（因此字节小于0x80）。最后，通过对这两个子表达式进行“与”运算，结果是由于在第一个子表达式中，由于子表达式中的值大于0x80而设置的高位被v第二。

一次查看一个字节至少需要花费与查看完整整数值（寄存器宽度）相同的cpu周期。在这个算法中，检查完整整数是否包含零。如果不是，则使用很少的指令，并且可以将跳转设置为下一个完整整数。如果内部有一个零字节，则会进一步检查以确定它的确切位置。

来源

2016-01-06 21:22:55

另外，gcc'strlen'实现所做的优化是利用支持8字节整数的体系结构。以上限于一次只能查找4个字节的空值。 'strlen'中的if（sizeof（longword）> 4）'比较扩展了额外4个字节的比较。无论是哪种方式，都可以提高字符串的'strlen'性能，长度大于32字符。（高于你用* char-by-char检查得到的*）。好答案。 –

@greybeard - 更新 - 谢谢。 –

在strlen实现中了解代码

回答

相关问题