2017-02-27 41 views
0

我正在寻找最有效的方式来接受字符串和令牌,将它蜕化为一个数组,将所有HTML标记组分隔开来。Tokenize或将字符串拆分为文本和Html标记项

Example Input (String): 
    "I can format my text so that <strong>This is bold</strong> and this is not." 

Desired Output (String[] array): 
    "I can format my text so that", 
    "<strong>", 
    "This is bold", 
    "</strong>", 
    "and this is not." 

Alternate Output Just As Good(String[] array): 
    "I", 
    "can", 
    "format", 
    "my", 
    "text", 
    "so", 
    "that", 
    "<strong>", 
    "This", 
    "is", 
    "bold", 
    "</strong>", 
    "and", 
    "this", 
    "is", 
    "not." 

我不确定如何解决此问题的最佳方法。任何帮助,将不胜感激。

+0

'Regex.Split(inputString, “(<=>)|(= <)?”);' –

+0

使用正则表达式' .Split(s,@“(<[^<]*?>)”)' –

回答

0

可以使用Regex.Split()了一套零长度断言通过>的地方,然后<或之前分裂:

string input = "I can format my text so that <strong>This is bold</strong> and this is not."; 
string[] output = Regex.Split(input, "(?=<)|(?<=>)"); 

(?=pattern)被称为前瞻断言,确保pattern如下。
(?<=pattern)是向后看断言,相同的概念,但前看着字符位置