检查字符串是否包含多个单词的最快/最简单的方法

我正在使用C++ 11，并且可以使用正则表达式的东西。我想知道检查一个字符串是否包含多个单词的最快方法是什么，如果是的话多少。这种情况下的单词被定义为用空格分隔的字符组。检查字符串是否包含多个单词的最快/最简单的方法

我有几个选择：

由空格分割字符串，算分
长度使用某种形式的正则表达式的
计数空白字符

选项1是最容易的方式，但考虑到多个空白字符会使分裂更复杂一些。 2可能会更慢，我不确定我会如何计算它。 3是我能想到的最快的，但是可能要考虑很多角落案例。我希望我的解决方案尽可能高效并包含尽可能少的额外库。对我来说这是一个可以解决的问题，但我需要更多的洞察力来解决什么是最好的解决方案。

我倾向于第一个，但那么最好的功能是什么？ istringstream加一个迭代器，stringstream，一些char*魔法？我不确定最快的方法是什么。

来源

2017-05-01 Hal T

的_best_解决方案是任何适合您的需求最多。你需要它快速，还是易于阅读/维护？你可以随时编写所有三个，在它们上运行基准测试，并查看你想使用哪一个。 –

@FrançoisAndrieux我加了一点点说明。 –

如果你想在字符串中找到相似的单词，我认为你必须将每个单词存储在向量中并对其进行排序。在这种情况下，相似的单词将在序列中逐一定位。在这里你得到下一个复杂度O（NlogN）+ O（N）（排序+检查向量中的每个元素）。 – arturx64

我会遍历字符串，计算单词并迭代任何连续的空格。

每当从空白移动到非空白时，增加字数。

增加字数，如果字符串非空白

int countWords(string& toCount, const string& whitespace){ 
    enum countStatus { 
     startOfString, 
     initialWhitespace, 
     movePastWord, 
     movePastWhitespace 
    } status=startOfString;  

    int wordCount=0; 

    for(char& c : toCount) { 
     bool characterIsWhitespace=false; 

     if (whitespace.find(c)!=string::npos) { 
      characterIsWhitespace=true; 
     } 

     switch(status) { 
      case startOfString: 
       if (characterIsWhitespace) { 
        status=initialWhitespace; 
       } else { 
        status=movePastWord; 
        wordCount++; 
       } 
      break; 

      case initialWhitespace: 
       if (!characterIsWhitespace) { 
        wordCount++; 
        status=movePastWord; 
       } 
      break; 

      case movePastWord: 
       if (characterIsWhitespace) { 
        status=movePastWhitespace; 
       } 
      break; 

      case movePastWhitespace: 
       if (!characterIsWhitespace) { 
        wordCount++; 
        status=movePastWord; 
       } 
     } 
    } 

    return wordCount; 
}

来源

2017-05-01 17:02:41

我已经发布了所有的代码，包括int main（）等，但是我花费更长时间努力争取#include来正确格式化写这个方法。 –

开始。在这种情况下，你可以利用关联容器。 C++有多种选择。例如，您可以使用std::map。在下面的代码中，您可以计算文本中出现多少个单词。

#include <iostream> 
#include <string> 
#include <map> 
#include <algorithm> 


int main() 
{ 
    std::map<std::string,int> strCount; 
    std::string str("AA BB ABC AA GE AAf FF JJ BB CC "); 
    std::string temp; 

    // Split String Based on Whitespace (i.e. you need to modify it to suit the text format you have.) 
    for (int i(0); i < str.size(); ++i){ 
     temp += str[i]; 
     if (str[i] == ' '){ 
      temp.pop_back(); 
      ++strCount[temp]; // <-- if element new, insert it in map and associate new counter, otherwise increment counter of element. 
      temp.clear(); 
     } 
    } 


    std::map<std::string,int>::const_iterator iter; 
    for(iter = strCount.begin(); iter != strCount.end(); iter++) { 
     std::cout << "#: " << iter->second << " string: " << iter->first << std::endl; 
    } 
    return 0; 
}

上述代码的输出是

#: 2 string: AA 
#: 1 string: AAf 
#: 1 string: ABC 
#: 2 string: BB 
#: 1 string: CC 
#: 1 string: FF 
#: 1 string: GE 
#: 1 string: JJ

来源

2017-05-02 07:09:24 CroCo

检查字符串是否包含多个单词的最快/最简单的方法

回答

相关问题