2013-12-19 117 views
3

我有以下字符串:解析字符串内的这个字符串的最佳方法是什么?

string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))" 

我想分析出这个字符串变成

string group = ParseoutGroup(fullString); // Expect "2843360" 
string[] teams = ParseoutTeamNames(fullString); // Expect array with three items 

在满弦的例子而言,我可以列出一个或多个团队(并非总是如上所述的三个)。

我有这个部分工作,但我的代码感觉很hacky,并没有很好的将来证明,所以我想看看是否有更好的正则表达式解决方案在这里或更优雅的方式来解析这些值从这个完整的字符串?之后可能会有其他的东西添加到字符串中,所以我希望它尽可能地万无一失。

+4

如果您没有看到其他解决方案,则很难提供更好的解决方案 –

+2

为什么不发布当前的解决方案,我们可以看到有关改进方案。 – tofutim

+0

“并非总是像上面那样3” – tofutim

回答

4

我没做到这一点使用regular expressions

var str = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 

// Grabs the group ID 
var group = Regex.Match(str, @"group = '(?<ID>\d+)'", RegexOptions.IgnoreCase) 
    .Groups["ID"].Value; 

// Grabs everything inside teams parentheses 
var teams = Regex.Match(str, @"team in \((?<Teams>(\s*'[^']+'\s*,?)+)\)", RegexOptions.IgnoreCase) 
    .Groups["Teams"].Value; 

// Trim and remove single quotes 
var teamsArray = teams.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) 
    .Select(s => 
     { 
      var trimmed = s.Trim(); 
      return trimmed.Substring(1, trimmed.Length - 2); 
     }).ToArray(); 

结果将是:

string[] { "TEAM1", "TEAM2", "TEAM3" } 
+1

布鲁诺做得很好。 – tofutim

6

在最简单的情况下,正则表达式可能是最好的答案。 不幸的是,在这种情况下,我们似乎需要解析一部分SQL语言。虽然可以用正则表达式解决这个问题,但它们并不是用来解析复杂的语言(嵌套括号和转义字符串)。

这些需求也会随着时间的推移而变化,并且需要解析更复杂的结构。

如果公司政策允许,我将选择构建内部DSL以解析此字符串。

我最喜欢的工具来构建内部DLSS被称为Sprache

下面你可以找到使用内部DSL的方法的例子解析器。

在代码中,我已经定义了原语来处理所需的SQL操作符,并将其构成最终解析器。

[Test] 
    public void Test() 
    { 
     string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 


     var resultParser = 
      from @group in OperatorEquals("group") 
      from @and in OperatorEnd() 
      from @team in Brackets(OperatorIn("team")) 
      select new {@group, @team}; 
     var result = resultParser.Parse(fullString); 
     Assert.That(result.group, Is.EqualTo("2843360")); 
     Assert.That(result.team, Is.EquivalentTo(new[] {"TEAM1", "TEAM2", "TEAM3"})); 
    } 

    private static readonly Parser<char> CellSeparator = 
     from space1 in Parse.WhiteSpace.Many() 
     from s in Parse.Char(',') 
     from space2 in Parse.WhiteSpace.Many() 
     select s; 

    private static readonly Parser<char> QuoteEscape = Parse.Char('\\'); 

    private static Parser<T> Escaped<T>(Parser<T> following) 
    { 
     return from escape in QuoteEscape 
       from f in following 
       select f; 
    } 

    private static readonly Parser<char> QuotedCellDelimiter = Parse.Char('\''); 

    private static readonly Parser<char> QuotedCellContent = 
     Parse.AnyChar.Except(QuotedCellDelimiter).Or(Escaped(QuotedCellDelimiter)); 

    private static readonly Parser<string> QuotedCell = 
     from open in QuotedCellDelimiter 
     from content in QuotedCellContent.Many().Text() 
     from end in QuotedCellDelimiter 
     select content; 

    private static Parser<string> OperatorEquals(string column) 
    { 
     return 
      from c in Parse.String(column) 
      from space1 in Parse.WhiteSpace.Many() 
      from opEquals in Parse.Char('=') 
      from space2 in Parse.WhiteSpace.Many() 
      from content in QuotedCell 
      select content; 
    } 

    private static Parser<bool> OperatorEnd() 
    { 
     return 
      from space1 in Parse.WhiteSpace.Many() 
      from c in Parse.String("and") 
      from space2 in Parse.WhiteSpace.Many() 
      select true; 
    } 

    private static Parser<T> Brackets<T>(Parser<T> contentParser) 
    { 
     return from open in Parse.Char('(') 
       from space1 in Parse.WhiteSpace.Many() 
       from content in contentParser 
       from space2 in Parse.WhiteSpace.Many() 
       from close in Parse.Char(')') 
       select content; 
    } 

    private static Parser<IEnumerable<string>> ComaSeparated() 
    { 
     return from leading in QuotedCell 
       from rest in CellSeparator.Then(_ => QuotedCell).Many() 
       select Cons(leading, rest); 
    } 

    private static Parser<IEnumerable<string>> OperatorIn(string column) 
    { 
     return 
      from c in Parse.String(column) 
      from space1 in Parse.WhiteSpace 
      from opEquals in Parse.String("in") 
      from space2 in Parse.WhiteSpace.Many() 
      from content in Brackets(ComaSeparated()) 
      from space3 in Parse.WhiteSpace.Many() 
      select content; 
    } 

    private static IEnumerable<T> Cons<T>(T head, IEnumerable<T> rest) 
    { 
     yield return head; 
     foreach (T item in rest) 
      yield return item; 
    } 
0

我认为你需要寻找到一个标记化过程,以得到期望的结果,并考虑到由括号建立执行顺序。您可以使用分流码算法来协助标记和执行顺序。

分流场的优点是它允许你定义令牌,以后可以用它来解析字符串并执行正确的操作。虽然它通常适用于操作的数学顺序,但它可以根据您的目的进行调整。

下面是一些信息:

http://en.wikipedia.org/wiki/Shunting-yard_algorithm http://www.slideshare.net/grahamwell/shunting-yard

1

有probabl这是一个正则表达式的解决方案,但如果格式严格,我首先尝试高效的字符串方法。以下内容适用于您的输入。

我使用的是自定义类,TeamGroup,封装复杂性和一个对象来保存所有相关属性:

public class TeamGroup 
{ 
    public string Group { get; set; } 
    public string[] Teams { get; set; } 

    public static TeamGroup ParseOut(string fullString) 
    { 
     TeamGroup tg = new TeamGroup{ Teams = new string[]{ } }; 
     int index = fullString.IndexOf("group = '"); 
     if (index >= 0) 
     { 
      index += "group = '".Length; 
      int endIndex = fullString.IndexOf("'", index); 
      if (endIndex >= 0) 
      { 
       tg.Group = fullString.Substring(index, endIndex - index).Trim(' ', '\''); 
       endIndex += 1; 
       index = fullString.IndexOf(" and (team in (", endIndex); 
       if (index >= 0) 
       { 
        index += " and (team in (".Length; 
        endIndex = fullString.IndexOf(")", index); 
        if (endIndex >= 0) 
        { 
         string allTeamsString = fullString.Substring(index, endIndex - index); 
         tg.Teams = allTeamsString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries) 
          .Select(t => t.Trim(' ', '\'')) 
          .ToArray(); 
        } 
       } 
      } 
     } 
     return tg; 
    } 
} 

你会使用它这样:

string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 
TeamGroup tg = TeamGroup.ParseOut(fullString); 
Console.Write("Group: {0} Teams: {1}", tg.Group, string.Join(", ", tg.Teams)); 

输出:

Group: 2843360 Teams: TEAM1, TEAM2, TEAM3 
0

如果fullString不是机器生成的,则可能需要添加一些err或捕捉,但这将开箱即用,并给你一个测试工作。

public string ParseoutGroup(string fullString) 
    { 
     var matches = Regex.Matches(fullString, @"group\s?=\s?'([^']+)'", RegexOptions.IgnoreCase); 
     return matches[0].Groups[1].Captures[0].Value; 
    } 

    public string[] ParseoutTeamNames(string fullString) 
    { 
     var teams = new List<string>(); 
     var matches = Regex.Matches(fullString, @"team\s?in\s?\((\s*'([^']+)',?\s*)+\)", RegexOptions.IgnoreCase); 
     foreach (var capture in matches[0].Groups[2].Captures) 
     { 
      teams.Add(capture.ToString()); 
     } 
     return teams.ToArray(); 
    } 

    [Test] 
    public void parser() 
    { 
     string test = "group = '2843360' and (team in ('team1', 'team2', 'team3'))"; 
     var group = ParseoutGroup(test); 
     Assert.AreEqual("2843360",group); 

     var teams = ParseoutTeamNames(test); 
     Assert.AreEqual(3, teams.Count()); 
     Assert.AreEqual("team1", teams[0]); 
     Assert.AreEqual("team2", teams[1]); 
     Assert.AreEqual("team3", teams[2]); 
    } 
0

的除了@ BrunoLM的解决方案:

(值得额外的行,如果你有更多的变数,检查以后):

您可以分割字符串“和“关键字,并有一个函数来检查每个子句与适当的正则表达式语句并返回所需的值。

(未经测试的代码,但它应该实现这个想法。)

statments = statment.split('and') 
//So now: 
//statments[0] = "group = '2843360' " 
//statments[1] = "(team in ('TEAM1', 'TEAM2','TEAM3'))" 
foreach s in statments { 
    if (s.contains('group') group = RegexFunctionToExtract_GroupValue(s) ; 
    if (s.contains('team') teams = RegexFunctionToExtract_TeamValue(s) ; 
} 

我认为,这种做法将提供更清洁,易于维护的代码和轻微优化。

当然,这种方法并不期望有一个“OR”子句。但是,可以稍微调整一下。

相关问题