使用复杂标准将文本拆分为数组

我有一个字符串，需要拆分为一个数组。大多数情况下，不同的部分之间用.（点）分隔，但有时，该字符串可能包含一个包含大括号的部分，并且大括号内的任何点不应被解释为分隔字符。使用复杂标准将文本拆分为数组

我建立了下面的代码要做到这一点，但不知道是否有一个更优雅的解决方案（比如正则表达式）

Pov = UCase(Trim(Pov)) 

'Loop through the Pov and escape any dots inside curly brackets 
Level = 0 
Escaped = "" 
For Pos = 1 To Len(Pov) 
    PosChar = Mid(Pov, Pos, 1) 
    If PosChar = "{" Then 
     Level = Level + 1 
     Escaped = Escaped & PosChar 
    ElseIf PosChar = "}" Then 
     Level = Level - 1 
     Escaped = Escaped & PosChar 
    ElseIf PosChar = "." Then 
     If Level > 0 Then 
      Escaped = Escaped & "^^^ This is a nested dot ^^^" 
     Else 
      Escaped = Escaped & PosChar 
     End If 
    Else 
     Escaped = Escaped & PosChar 
    End If 
Next 

'Split the Pov and replace any nested dots 
PovSplit = Split(Pov, ".") 
For Part = LBound(PovSplit) To UBound(PovSplit) 
    PovSplit(Part) = Replace(PovSplit(Part), "^^^ This is a nested dot ^^^", ".") 
Next

来源

2014-05-21 neelsg

不行，不能用正则表达式“直接”做。和here你可以阅读为什么。

总之，使用正则表达式的解决方案（大量的代码，但是这取决于你的数据长度也可以是快还是没有，你需要尝试）

Dim dicEncode 
    set dicEncode = WScript.CreateObject("Scripting.Dictionary") 

Dim encodeRE 
    Set encodeRE = New RegExp 
    With encodeRE 
     .Pattern = "\{[^{}]*\}" 
     .Global = True 
     .IgnoreCase = True 
    End With 

Dim decodeRE 
    Set decodeRE = New RegExp 
    With decodeRE 
     .Pattern = "\x00(K[0-9]+)\x00" 
     .Global = True 
     .IgnoreCase = True 
    End With 

Function encodeFunction(matchString, position, fullString) 
    Dim key 
     key = "K" & CStr(dicEncode.Count) 
    dicEncode.Add key , matchString 
    encodeFunction = Chr(0) & key & Chr(0) 
End Function 

Function decodeFunction(matchString, key, position, fullString) 
    decodeFunction = dicEncode.Item(key) 
End Function 


Dim originalString  
    originalString = "{abc.def{gh.ijk}l.m}n.o.p{q.r{s{t{u.v}}}w}.x" 

Dim encodedString, workBuffer 

    encodedString = originalString 
    Do 
     workBuffer = encodedString 
     encodedString = encodeRE.Replace(encodedString,GetRef("encodeFunction")) 
    Loop While encodedString <> workBuffer 

    encodedString = Replace(encodedString, ".", Chr(0)) 

    Do 
     workBuffer = encodedString 
     encodedString = decodeRE.Replace(encodedString,GetRef("decodeFunction")) 
    Loop While encodedString <> workBuffer 

Dim aElements, element 
    aElements = Split(encodedString, Chr(0)) 

    WScript.Echo originalString 

    For Each element In aElements 
     WScript.Echo element 
    Next

所有这些代码只是使用普通表达式来查找字符串中的配对大括号，用存储在字典中的关键指标替换它们及其包含的数据。当所有“封闭”数据从字符串中删除时，剩余的点（您的分割点）被替换为新字符（稍后将用于分割字符串），然后重建字符串。所有“封闭的”点都受到保护，可以使用新字符（代码中的Chr（0））对字符串进行拆分。

它与字典创建的统计压缩器类似，但当然没有任何统计数据和压缩。

但只适用于长字符串。如果不是，你的原始方法会更好。

EDITED适应评论

对于性能更好的代码的基础上，OP原始的方法。没有特殊的定期表达。只是减少字符串连接和不必要的检查。

Function mySplit(originalString) 
Dim changedString, currentPoint, currentChar, stringEnd, level 

    changedString = originalString 
    stringEnd = Len(originalString) 

    level = 0 
    For currentPoint = 1 To stringEnd 
     currentChar = Mid(originalString, currentPoint, 1) 
     If currentChar = "{" Then 
      level = level + 1 
     ElseIf currentChar = "}" Then 
      If level > 0 Then 
       level = level - 1 
      End If 
     ElseIf level = 0 Then 
      If currentChar = "." Then 
       changedString = Left(changedString,currentPoint-1) & Chr(0) & Right(changedString,stringEnd-currentPoint) 
      End If 
     End If 
    Next 

    mySplit = split(changedString, Chr(0)) 
End Function

来源

2014-05-21 11:20:01

感谢您的回答。我们只是每个字符串谈论80个左右的字符，但是每分钟（甚至每秒）对不同的字符串运行这种算法几千次，所以效率仍然很重要（每次读取/写入内容时基本上都需要这样做一个包含大量数据和大量数据的OLAP数据库） – neelsg

@neelsg，我已经包含了一个你的代码的剥离版本。在所示的情况下，这将表现得更好。 –

使用复杂标准将文本拆分为数组

回答

相关问题