2011-04-21 56 views
0

我想从搜索字段的文本中创建一个字符串列表。 我想使用双引号将任何内容分开。如何分割搜索字符串以允许引用文本?

ex。
sample' "string's are, more "text" making" 12.34,hello"pineapple sundays

主要生产

sample' 
string's are, more_ //underscore shown to display space 
text 
making 
12.34 
hello 
pineapple 
sundays 

编辑:这是我的(有点)优雅的解决方案,感谢大家的帮助!

Private Function GetSearchTerms(ByVal searchText As String) As String() 
    'Clean search string of unwanted characters' 
    searchText = System.Text.RegularExpressions.Regex.Replace(searchText, "[^a-zA-Z0-9""'.,= ]", "") 

    'Guarantees the first entry will not be an entry in quotes if the searchkeywords starts with double quotes' 
    Dim searches As String() = searchText.Replace("""", " "" ").Split("""") 
    Dim myWords As System.Collections.Generic.List(Of String) = New System.Collections.Generic.List(Of String) 
    Dim delimiters As String() = New String() {" ", ","} 

    For index As Integer = 0 To searches.Length - 1 
     'even is regular text, split up into individual search terms' 
     If (index Mod 2 = 0) Then 
      myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries)) 
     Else 
      'check for unclosed double quote, if so, split it up and add, space we added earlier will get split out' 
      If (searches.Length Mod 2 = 0 And index = searches.Length - 1) Then 
       myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries)) 
      Else 
       '2 double quotes found' 
       'remove the 2 spaces that we added earlier' 
       Dim myQuotedString As String = searches(index).Substring(1, searches(index).Length - 2) 
       If (myQuotedString.Length > 0) Then 
        myWords.Add(myQuotedString) 
       End If 
      End If 
     End If 
    Next 
    Return myWords.ToArray() 
End Function 

Oi,vb评论是丑陋的,任何人都知道如何清理它?

+1

菠萝星期天! – hunter 2011-04-21 18:01:27

+0

我的示例字符串中有5个引号。我想忽略最后一个没有相应的结束引用。 – 2011-04-21 18:03:59

+0

我将在引号外的所有内容上分割空格和逗号。 – 2011-04-21 18:08:15

回答

1

这不是完整的解决方案,因为它缺少一些验证检查,但它拥有您需要的一切。

我CharOccurs()发现'"'的发生并按顺序将它们存储到列表中。

public static List<int> CharOccurs(string stringToSearch, char charToFind) 
     { 
      List<int> count = new List<int>(); 
      int chr = 0; 
      while (chr != -1) 
      { 
       chr = stringToSearch.IndexOf(charToFind, chr); 
       if (chr != -1) 
       { 
        count.Add(chr); 
        chr++; 
       } 
       else 
       { 
        chr = -1; 
       } 
      } 
      return count; 
     } 

这下面的代码几乎是解释性的iteself。我将引用内容的字符串拆分为'"' character。然后我在外部引号字符串上进行SubStrings并将它们分成",", space and '"'字符。请在需要的地方添加验证检查以使其通用。

string input = "sample' \"string's are, more \"text\" making\" 12.34,hello\"pineapple sundays"; 

      List<int> positions = CharOccurs(input, '\"'); 

      string within_quotes, outside_quotes; 
      string[] arr_within_quotes; 
      List<string> output = new List<string>(); 

      output.AddRange(input.Substring(0, positions[0]-1).Split(new char[] { ' ', ',', '"' })); 

      if (positions.Count % 2 == 0) 
      { 
       within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 1] - positions[0]-1); 
       arr_within_quotes = within_quotes.Split('"'); 
       output.AddRange(arr_within_quotes); 
       output.AddRange(input.Substring(positions[positions.Count - 1] + 1).Split(new char[] { ' ', ',' })); 
      } 
      else 
      { 
       within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 2] - positions[0]-1); 
       arr_within_quotes = within_quotes.Split('"'); 
       output.AddRange(arr_within_quotes); 
       output.AddRange(input.Substring(positions[positions.Count - 2] + 1).Split(new char[] { ' ', ',', '"' })); 
      } 
+0

我在第一遍时了解了这个解决方案。谢谢你的帮助。 – 2011-04-21 20:15:27

1

我几个月前写这个剖析行功能VB.NET,也可能是一些对你有用的,它的作品了,如果有文字预选赛,它将拆分基于文本,尽量病如果你想要,可以在未来几分钟内将它转换为C#。

你会为你的文本:

样本”‘的字符串的是,更多的‘文本’制造’12.34,你好“菠萝周日

,你将有一个作为你的strLine中 ,你会设置你的strDataDelimeters = “” ,你会设置你strTextQualifier = “” “”

希望这有助于你出去。

Public Function ParseLine(ByVal strLine As String, Optional ByVal strDataDelimiter As String = "", Optional ByVal strTextQualifier As String = "", Optional ByVal strQualifierSplitter As Char = vbTab) As String() 
     Try 
      Dim strField As String = Nothing 
      Dim strNewLine As String = Nothing 
      Dim lngChrPos As Integer = 0 
      Dim bUseQualifier As Boolean = False 
      Dim bRemobedLastDel As Boolean = False 
      Dim bEmptyLast As Boolean = False ' Take into account where the line ends in a field delimiter, the ParseLine function should keep that empty field as well. 


      Dim strList As String() 

      'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000 
      'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000, 
      'TEST,23479234,Just Right 950g,02/04/2006,1234,,,0000, 
      'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,, 
      'TEST,23479234,"Just Right 950g, BO",02/04/2006,,5678,9999,, 
      'TEST,23479234,"Just Right"" 950g, BO",02/04/2006,,5678,9999,1111, 
      'TEST23479234 'Kellogg''s Just Right 950g' 02/04/2006 1234 5678 0000 9999 
      'TEST23479234 '' 02/04/2006 1234 5678 0000 9999 

      bUseQualifier = strTextQualifier.Length() 

      'split data based on options.. 
      If bUseQualifier Then 
       'replace double qualifiers for ease of parsing.. 
       'strLine = strLine.Replace(New String(strTextQualifier, 2), vbTab) 

       'loop and find each field.. 
       Do Until strLine = Nothing 

        If strLine.Substring(0, 1) = strTextQualifier Then 

         'find closing qualifier 
         lngChrPos = strLine.IndexOf(strTextQualifier, 1) 

         'check for missing double qualifiers, unclosed qualifiers 
         Do Until (strLine.Length() - 1) = lngChrPos OrElse lngChrPos = -1 OrElse _ 
          strLine.Substring(lngChrPos + 1, 1) = strDataDelimiter 

          lngChrPos = strLine.IndexOf(strTextQualifier, lngChrPos + 1) 
         Loop 

         'get field from line.. 
         If lngChrPos = -1 Then 
          strField = strLine.Substring(1) 
          strLine = vbNullString 
         Else 
          strField = strLine.Substring(1, lngChrPos - 1) 
          If (strLine.Length() - 1) = lngChrPos Then 
           strLine = vbNullString 
          Else 
           strLine = strLine.Substring(lngChrPos + 2) 
           If strLine = "" Then 
            bEmptyLast = True 
           End If 
          End If 

          'strField = String.Format("{0}{1}{2}", strTextQualifier, strField, strTextQualifier) 
         End If 

        Else 
         'find next delimiter.. 
         'lngChrPos = InStr(1, strLine, strDataDelimiter) 
         lngChrPos = strLine.IndexOf(strDataDelimiter) 

         'get field from line.. 
         If lngChrPos = -1 Then 
          strField = strLine 
          strLine = vbNullString 
         Else 
          strField = strLine.Substring(0, lngChrPos) 
          strLine = strLine.Substring(lngChrPos + 1) 
          If strLine = "" Then 
           bEmptyLast = True 
          End If 
         End If 
        End If 

        ' Now replace double qualifiers with a single qualifier in the "corrected" string 
        strField = strField.Replace(New String(strTextQualifier, 2), strTextQualifier) 

        'restore double qualifiers.. 
        'strField = IIf(strField = vbNullChar, vbNullString, strField) 
        'strField = Replace$(strField, vbTab, strTextQualifier) 
        'strField = IIf(strField = vbTab, vbNullString, strField) 
        'strField = strField.Replace(vbTab, strTextQualifier) 

        'save field to array.. 
        strNewLine = String.Format("{0}{1}{2}", strNewLine, strQualifierSplitter, strField) 

       Loop 

       If bEmptyLast = True Then 
        strNewLine = String.Format("{0}{1}", strNewLine, strQualifierSplitter) 
       End If 

       'trim off first nullchar.. 
       strNewLine = strNewLine.Substring(1) 

       'split new line.. 
       strList = strNewLine.Split(strQualifierSplitter) 
      Else 
       If strLine.Substring(strLine.Length - 1, 1) = strDataDelimiter Then 
        strLine = strLine.Substring(0) 
       End If 
       'no qualifier.. do a simply split.. 
       strList = strLine.Split(strDataDelimiter) 
      End If 

      'return result.. 
      Return strList 

     Catch ex As Exception 
      Throw New Exception(String.Format("Error Splitting Special String - {0}", ex.Message.ToString())) 
     End Try 
    End Function 
+1

无需转换。我需要在vb中使用这个解决方案。谢谢! – 2011-04-21 18:06:36

+0

我使用这个相同的逻辑超过400每周的数据加载,所以它已经过测试,如果这是答案,你可以投票请。 – 2011-04-21 18:08:55

1

如果你想显示下划线来指示空间之前”,就像你在你的问题告诉你可以使用:

string[] splitString = t.Replace(" \"", "_\"").Split('"'); 
+0

这只是让我们看到示例中的空间,它不是一个理想的效果。 – Kobi 2011-04-21 18:09:34

2

这是一个更复杂的分析问题比你充分体会。我建议你看一下TextFieldParser类和FileHelpers库:http://www.filehelpers.com/

+0

吉姆,这是一个非常好的资源,感谢分享这个人,周一不适合使用! – 2011-04-21 18:18:09

+0

我很欣赏这个资源,但是我不能在这个项目中有一个外部库。 – 2011-04-21 20:11:39

0

对于这样的事情正则表达式变得复杂当你开始添加各种异常。

尽管如此,如果比什么都感兴趣,完整性起见:

(?<term>[a-zA-Z0-9'.=]+)|("(?<term>[^"]+)") 
+0

是否有可能通过引号内的转义引号(双引号)来改变RegEx的正则表达式,正如您通常在文本限定的csv文件(例如从Excel或Access中导出)中看到的一样? – eidylon 2011-09-27 14:18:31

相关问题