2010-05-25 47 views
1

我试图解析文本格式。我想标记内联代码,就像SO一样,用反引号(`)。规则应该是,如果你想在内联代码元素中使用反引号,你应该在内联代码中使用双反引号。文本解析 - 我的解析器跳过命令

这样的:

`反引号与`标记内嵌代码(`)``

我的解析器似乎双反引号完全跳过出于某种原因。继承人的,因为该内嵌代码解析函数的代码:

private string ParseInlineCode(string input) 
    { 
     for (int i = 0; i < input.Length; i++) 
     { 
      if (input[i] == '`' && input[i - 1] != '\\') 
      { 
       if (input[i + 1] == '`') 
       { 
        string str = ReadToCharacter('`', i + 2, input); 
        while (input[i + str.Length + 2] != '`') 
        { 
         str += ReadToCharacter('`', i + str.Length + 3, input); 
        } 
        string tbr = "``" + str + "``"; 
        str = str.Replace("&", "&amp;"); 
        str = str.Replace("<", "&lt;"); 
        str = str.Replace(">", "&gt;"); 
        input = input.Replace(tbr, "<code>" + str + "</code>"); 
        i += str.Length + 13; 
       } 
       else 
       { 
        string str = ReadToCharacter('`', i + 1, input); 
        input = input.Replace("`" + str + "`", "<code>" + str + "</code>"); 
        i += str.Length + 13; 
       } 
      } 
     } 
     return input; 
    } 

如果我使用的东西左右单反引号,它正确地将其包装在<code>标签。

+2

RegEx更适合这份工作吗? – Propeng 2010-05-25 19:21:02

回答

4

while -loop

while (input[i + str.Length + 2] != '`') 
{ 
    str += ReadToCharacter('`', i + str.Length + 3, input); 
} 

你看错误的指数 - i + str.Length + 2而不是i + str.Length + 3 - 反过来你必须在体内添加反向效应。它应该可能是

while (input[i + str.Length + 3] != '`') 
{ 
    str += '`' + ReadToCharacter('`', i + str.Length + 3, input); 
} 

但是在你的代码中还有一些bug。如果输入的第一个字符是反引号,则以下行将导致IndexOutOfRangeException

if (input[i] == '`' && input[i - 1] != '\\') 

并且如果输入包含奇数个分离反引号的和输入的最后一个字符是一个反引号下面一行将导致IndexOutOfRangeException

if (input[i + 1] == '`') 

您应该将代码复制到更小的方法中,而不是处理单个方法中的许多情况 - 这很容易出现错误。如果您还没有为我强烈建议这样做的代码写入单元测试。由于解析器并不是很容易测试,因为各种无效输入都需要为您准备,所以您可以看看PEX - 通过分析所有分支点并尝试将每个分支点自动生成代码的测试用例的工具可能的代码路径。

我很快就开始使用PEX并运行代码 - 它发现了我想到的IndexOutOfRangeException等等。如果输入是空引用,PEX当然会发现明显的NullReferenceExceptions。以下是PEX发现导致异常的输入。

case1 = "`" 

case2 = "\0`" 

case3 = "\0``" 

case4 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0````" 

case5 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`" 

case6 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``<\0\0```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0`\0```````````````" 

我的代码“修复”改变了导致异常(也可能引入新bug)的输入。 PEX在修改过的代码中发现了以下内容。

case7 = "\0```" 

case8 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0`\0" 

case9 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``\0`\0`\0``" 

所有三个输入都没有在原始代码中导致异常,而情况4和6在修改后的代码中不再导致异常。

1

这里是LinqPad测试一个小片段,让你开始

void Main() 
{ 
    string test = "here is some code `public void Method()` but ``this is not code``"; 
    Regex r = new Regex(@"(`[^`]+`)"); 

    MatchCollection matches = r.Matches(test); 

    foreach(Match match in matches) 
    { 
     Console.Out.WriteLine(match.Value); 
     if(test[match.Index - 1] == '`') 
      Console.Out.WriteLine("NOT CODE"); 
      else 
     Console.Out.WriteLine("CODE"); 
    } 
} 

输出:

`public void Method()` 
CODE 
`this is not code` 
NOT CODE 
+0

我认为你有反引号与单引号 – 2010-05-25 21:36:22

+0

混淆事实上,我确实键入单引号,固定。 – 2010-05-26 16:22:12