2015-03-31 42 views
0

正在尝试更正格式不正确的HTML表格。我无法控制源代码,我的应用程序只是将下载文件的内容作为常规文本文件加载。文件内容是一个简单的HTML表格,缺少关闭</tr>元素。我试图拆分<tr>上的内容以获得一个数组,我可以将</tr>拖到需要它的元素的末尾。当我尝试使用fleContents.Split("<tr>").ToList分割字符串时,我在得到的List(Of String)中得到了比应该更多的元素。String.Split返回错误的数组

在这里,我一个短小的测试代码,显示了相同的行为:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
Dim testArr As String() = testSource.Split("<tr>") 

'Maybe try splitting on a variable because you can't use a string literal containging "<>" in the Split method 
Dim seper as String = "<tr>" 
testArr As String() = testSource.Split(seper) 

'feed it a new string directly 
testArr = testSource .Split(New String("<tr>")) 

我预计testArr应包含3个元素,如下所示:

  1. "<table>"
  2. "<td>8172745</td>"
  3. "<td>8172745</td></table>"

然而,我收到以下的数组:

  1. ""
  2. "table>"
  3. "tr>"
  4. "td>8172745"
  5. "/td>"
  6. "tr>"
  7. "td>8172954"
  8. "/td>"
  9. "/table>"

有人可以请解释为什么字符串被拆成这个样子,我怎么能去获得我期待的结果?

回答

1

比你希望你的代码使用的是Split方法的不同过载。你要接受一个String[]StringSplitOptions参数的方法:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
Dim delimeter As String() = { "<tr>" } 
Dim testArr As String() = _ 
    testSource.Split(delimeter, StringSplitOptions.RemoveEmptyEntries) 

你可以看到它在IDEOne工作:

http://ideone.com/pcw6aq

1

尝试使用正则表达式像

Imports System.Text.RegularExpressions 

Public Class Form1 


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click 
     Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
     Dim testArr As String() = Regex.Split(testSource, "<tr>") 

     'Show The Array in TextBox1 
     TextBox1.Lines = testArr 

    End Sub 
End Class 

万事如意