最快的方法来确定是否两个字符串由一个字符

我写一个C++算法，需要两个字符串，如果你可以从字符串通过改变单个字符到另一个变异一个以串B返回true不同。这两个字符串的大小必须相等，并且只能有一个不同。我还需要访问已更改的索引以及已更改的strA字符。我发现了一个工作算法，但它遍历每一对单词，并且在任何大量输入中运行速度太慢。最快的方法来确定是否两个字符串由一个字符

bool canChange(std::string const& strA, std::string const& strB, char& letter) 
{ 
    int dif = 0; 
    int position = 0; 
    int currentSize = (int)strA.size(); 
    if(currentSize != (int)strB.size()) 
    { 
     return false; 
    } 
    for(int i = 0; i < currentSize; ++i) 
    { 
     if(strA[i] != strB[i]) 
     { 
      dif++; 
      position = i; 
      if(dif > 1) 
      { 
       return false; 
      } 
     } 
    } 
    if(dif == 1) 
    { 
     letter = strA[position]; 
     return true; 
    } 
    else return false; 
}

有关优化的任何建议？

来源

2016-01-26 dave vader

你想要什么样的优化？时间？记忆？顺便说一下，你没有处理已改变的字符的索引，你只是返回字符本身 – Pooya

我可以看到'canChange（）'（有趣的名字，在那）检查字符对 - 你能详细说明'每一个单词对？ – greybeard

我认为这是“通过每一对单词迭代”，这是慢的来源;发布的代码看起来非常有效。如果将文本1中的每个单词与文本2中的每个单词进行比较，那么这是一个O（N^2）操作，无论您对canChange（）进行了多少优化，它都将变得很慢。 –

您的输入有多大？

我认为，STRA [I]，STRB [I]有函数调用的开销，除非它的内联。因此，请确保您打开内联并进行编译并进行性能测试。否则，请尝试使用strA.c_str（）将字节作为char *。

如果所有的失败，它仍然不够快，试着将你的字符串分割成块，并使用memcmp或STRNCMP的块。如果没有区别，移动到下一个块，直到达到最后或找到差异。如果发现差异，请逐字节比较，直到找到差异为止。我建议这个路由，因为memcmp通常比你的微不足道的实现更快，因为它们可以利用处理器SSE扩展等来做非常快速的比较。

此外，您的代码存在问题。假设strA比strB长，并且只检查数组访问器的A的长度。

来源

2016-01-26 01:00:02 Matt

'...只检查数组访问器的A的长度 - 函数终止提前，除非两个字符串具有相同的长度。 –

这是一个有点难以得到检查所有的字符字符串，除非你能接受偶尔的不正确的结果了。

我建议使用标准库的功能，而不是尝试计算不匹配的数量。例如;

#include <string> 
#include <algorithm> 

bool canChange(std::string const& strA, std::string const& strB, char& letter, std::size_t &index) 
{ 
    bool single_mismatch = false; 
    if (strA.size() == strB.size()) 
    { 
     typedef std::string::const_iterator ci; 
     typedef std::pair<ci, ci> mismatch_result; 

     ci begA(strA.begin()), endA(strA.end()); 

     mismatch_result result = std::mismatch(begA, endA, strB.begin()); 

     if (result.first != endA) // found a mismatch 
     { 
      letter = *(result.first); 
      index = std::distance(begA, result.first); 

      // now look for a second mismatch 

      std::advance(result.first, 1); 
      std::advance(result.second, 1); 

      single_mismatch = (std::mismatch(result.first, endA, result.second).first == endA); 
     } 
    } 
    return single_mismatch; 
}

这适用于所有版本。它可以在C++ 11中简化一点。

如果上述返回true，然后一个错配被发现。

如果返回值为false，那么这些字符串的大小不一致，或者不匹配的数量不等于1（字符串相等或者有多个不匹配）。

letter和index不变如果字符串不同的长度或完全相等，但以其它方式识别所述第一失配（字符的值在strA，和index）。

来源

2016-01-26 01:51:50 Peter

如果要优化主要是完全相同的字符串，你可以使用的x86 SSE/AVX向量指令。您的基本想法看起来很好：只要您检测到第二个区别，就会中断。

要查找和计算字符差异，像PCMPEQB/PMOVMSKB/test-and-branch这样的序列可能是好的。（使用C/C++内部函数来获取这些向量指令）。当您的向量循环检测到当前块中的非零差异时，可以使用位掩码来查看是否刚刚找到第一个差异，或者如果在同一个块中发现了两个差异。

我扔了一个未经测试的，没有完全充实的AVX2版本的我所描述的。 此代码假定字符串长度为32的多个倍数。尽早停止并用清理结尾处理最后一个块是留给读者的一个练习。

#include <immintrin.h> 
#include <string> 

// not tested, and doesn't avoid reading past the end of the string. 
// TODO: epilogue to handle the last up-to-31 left-over bytes separately. 
bool canChange_avx2_bmi(std::string const& strA, std::string const& strB, char& letter) { 
    size_t size = strA.size(); 
    if (size != strB.size()) 
     return false; 

    int diffs = 0; 
    size_t diffpos = 0; 
    size_t pos = 0; 
    do { 
     uint32_t diffmask = 0; 
     while(pos < size) { 
      __m256i vecA = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strA[pos])); 
      __m256i vecB = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strB[pos])); 
      __m256i vdiff = _mm256_cmpeq_epi8(vecA, vecB); 
      diffmask = _mm256_movemask_epi8(vdiff); 
      pos += 32; 
      if (diffmask) break; // gcc makes worse code if you include && !diffmask in the while condition, instead of this break 
     } 
     if (diffmask) { 
      diffpos = pos + _tzcnt_u32(diffmask); // position of the lowest set bit. Could safely use BSF rather than TZCNT here, since we only run when diffmask is non-zero. 
      diffs += _mm_popcnt_u32(diffmask); 
     } 
    } while(pos < size && diffs <= 1); 

    if (diffs == 1) { 
     letter = strA[diffpos]; 
     return true; 
    } 
    return false; 
}

丑陋break，而不是包括在while条件显然有助于gcc generate better code。 do{}while()也符合我希望asm出来的方式。我没有尝试使用for或while循环来查看gcc会做什么。

内环真的是紧是这样的：

.L14: 
     cmp  rcx, r8 
     jnb  .L10  # the while(pos<size) condition 
.L6: # entry point for first iteration, because gcc duplicates the pos<size test ahead of the loop 

     vmovdqu ymm0, YMMWORD PTR [r9+rcx]  # tmp118,* pos 
     vpcmpeqb  ymm0, ymm0, YMMWORD PTR [r10+rcx]  # tmp123, tmp118,* pos 
     add  rcx, 32 # pos, 
     vpmovmskb  eax, ymm0  # tmp121, tmp123 
     test eax, eax  # tmp121 
     je  .L14  #,

从理论上讲，这应该每2个时钟（英特尔Haswell的）一个迭代运行。循环中有7个融合域uop。（将会是6，但是2-reg addressing modes apparently can't micro-fuse on SnB-family CPUs。）由于两个uops是负载，而不是ALU，所以在SnB/IvB上这个吞吐量也是可以实现的。

这对于在两条弦相同的区域飞行非常有用。正确处理任意字符串长度的开销会使得这可能比简单的标量函数慢，如果字符串很短，和/或早期有多个差异。

来源

2016-01-27 07:15:26

而不是popcnt，您可以使用奇偶校验来查看popcnt是奇数还是偶数。但是在x86上找到32位整数的奇偶校验（使用AVX2）的最快方法是使用'popcnt（x）＆1'。不需要SSE4的SSE版本可能需要这样做。（x86可以通过xor al，ah' /'setp al'（PF设置为偶校验，IIRC）进行16位奇偶校验。 –

最快的方法来确定是否两个字符串由一个字符

回答

相关问题