记号化字符串，包括在C++

分隔符，我用下面tokening，但不知道该如何与包括它的分隔符。记号化字符串，包括在C++

void Tokenize(const string str, vector<string>& tokens, const string& delimiters) 
{ 

    int startpos = 0; 
    int pos = str.find_first_of(delimiters, startpos); 
    string strTemp; 


    while (string::npos != pos || string::npos != startpos) 
    { 

     strTemp = str.substr(startpos, pos - startpos); 
     tokens.push_back(strTemp.substr(0, strTemp.length())); 

     startpos = str.find_first_not_of(delimiters, pos); 
     pos = str.find_first_of(delimiters, startpos); 

    } 
}

来源

2009-10-02 Jeremiah

的C++ String Toolkit Library (StrTk)具有以下溶液：

std::string str = "abc,123 xyz"; 
std::vector<std::string> token_list; 
strtk::split(";., ", 
      str, 
      strtk::range_to_type_back_inserter(token_list), 
      strtk::include_delimiters);

应该导致与token_list包括以下元素：

 
Token₀ = "abc," 
Token₁ = "123 " 
Token₂ = "xyz"

更多实例可以发现Here

来源

2009-10-17 21:59:15

我不能真的关注你的代码，你能发布一个工作程序吗？

无论如何，这是一个简单的标记生成器，而无需测试边缘情况：

#include <iostream> 
#include <string> 
#include <vector> 

using namespace std; 

void tokenize(vector<string>& tokens, const string& text, const string& del) 
{ 
    string::size_type startpos = 0, 
     currentpos = text.find(del, startpos); 

    do 
    { 
     tokens.push_back(text.substr(startpos, currentpos-startpos+del.size())); 

     startpos = currentpos + del.size(); 
     currentpos = text.find(del, startpos); 
    } while(currentpos != string::npos); 

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size())); 
}

实施例的输入，定界符= $$：

Hello$$Stack$$Over$$$Flow$$$$!

令牌：

Hello$$ 
Stack$$ 
Over$$ 
$Flow$$ 
$$ 
!

注：我将永远不会使用我未经测试写出的分词器！请使用boost::tokenizer！

来源

2009-10-02 18:38:19 AraK

1为Boost.Tokenizer提及 –

我编辑了m y发布包含所有的功能。我看到你做了什么，但分隔符将是一个字符串，字符串中的每个字符将是一个分隔符。通过像这样“！\ n”个因此，一个逗号，句号，感叹号和新的生产线将被推入载体为好，但是不占空间。通过这种方式，我可以将矢量加入并在矢量项之间使用空格并重新构建字符串。 – Jeremiah

逗号，句号，感叹号和包括空格在内的新行将成为分隔符。对不起，想清楚。 – Jeremiah

这取决于您希望使用前面的分隔符，下面的分隔符还是两者，以及您想要在字符串的开始和结尾处使用哪些字符串，而在字符串的前后可能没有分隔符。

我会假设你想每一个字，其前面和后面的分隔符，而不是分隔的任何字符串本身（例如，如果有以下的最后一个字符串分隔符）。

template <class iter> 
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0; 
    do { 
     int beg_word = str.find_first_not_of(delims, pos); 
     if (beg_word == std::string::npos) 
      break; 
     int end_word = str.find_first_of(delims, beg_word); 
     int beg_next_word = str.find_first_not_of(delims, end_word); 
     *out++ = std::string(str, pos, beg_next_word-pos); 
     pos = end_word; 
    } while (pos != std::string::npos); 
}

就目前而言，我写它更像是一个STL算法，以用于其输出迭代器，而不是假设它总是推到一个集合。由于它在输入中依赖于（现在）字符串，因此它不会为输入使用迭代器。

来源

2009-10-02 19:04:06

我想要字符串“Test string，on the web。\ nTest line one”。成为像这样的令牌。我想要一个空间，一个社区，一个时期，\ n作为分隔符。测试串，上的网络。 \ n 测试行一个。 – Jeremiah

对不起，它没有正确发布。在分隔符之后，它应该让每一件事都在新的一行上。 – Jeremiah

如果分隔符是字符，不是字符串，那么你可以使用strtok。

来源

2009-10-02 20:17:16

呵呵？ strtok有什么问题？ –

谢谢..我几乎已经忘记了这个功能：P – poorva

'strtok'消耗分隔符，我相信。 – Santa

我现在这一点不马虎，但是这是我结束了。我不想使用boost，因为这是一个学校任务，我的老师希望我使用find_first_of来完成这个任务。

感谢大家的帮助。

vector<string> Tokenize(const string& strInput, const string& strDelims) 
{ 
vector<string> vS; 

string strOne = strInput; 
string delimiters = strDelims; 

int startpos = 0; 
int pos = strOne.find_first_of(delimiters, startpos); 

while (string::npos != pos || string::npos != startpos) 
{ 
    if(strOne.substr(startpos, pos - startpos) != "") 
    vS.push_back(strOne.substr(startpos, pos - startpos)); 

    // if delimiter is a new line (\n) then addt new line 
    if(strOne.substr(pos, 1) == "\n") 
    vS.push_back("\\n"); 
    // else if the delimiter is not a space 
    else if (strOne.substr(pos, 1) != " ") 
    vS.push_back(strOne.substr(pos, 1)); 

    if(string::npos == strOne.find_first_not_of(delimiters, pos)) 
    startpos = strOne.find_first_not_of(delimiters, pos); 
    else 
    startpos = pos + 1; 

     pos = strOne.find_first_of(delimiters, startpos); 

} 

return vS; 
}

来源

2009-10-03 15:50:42 Jeremiah

记号化字符串，包括在C++

回答

相关问题