2013-05-29 121 views
1

我想解析两行HTML中的文本。正则表达式 n不起作用

Dim PattStats As New Regex("class=""head"">(.+?)</td>"+ 
          "\n<td>(.+?)</td>") 
Dim makor As MatchCollection = PattStats.Matches(page) 

For Each MatchMak As Match In makor 
    ListView3.Items.Add(MatchMak.Groups(1).Value) 
Next 

我添加了\n以匹配下一行,但由于某种原因,它不起作用。这是我运行正则表达式的源代码。

<table class="table table-striped table-bordered table-condensed"> 
    <tbody> 
    <tr> 
     <td class="head">Health Points:</td> 
     <td>445 (+85/per level)</td> 
     <td class="head">Health Regen:</td> 
     <td>7.25</td> 
    </tr> 
    <tr> 
     <td class="head">Energy:</td> 
     <td>200</td> 
     <td class="head">Energy Regen:</td> 
     <td>50</td> 
    </tr> 
    <tr> 
     <td class="head">Damage:</td> 
     <td>53 (+3.2/per level)</td> 
     <td class="head">Attack Speed:</td> 
     <td>0.694 (+3.1/per level)</td> 
    </tr>   
    <tr> 
     <td class="head">Attack Range:</td> 
     <td>125</td> 
     <td class="head">Movement Speed:</td> 
     <td>325</td> 
    </tr> 
    <tr> 
     <td class="head">Armor:</td> 
     <td>16.5 (+3.5/per level)</td> 
     <td class="head">Magic Resistance:</td> 
     <td>30 (+1.25/per level)</td> 
    </tr>  
    <tr> 
     <td class="head">Influence Points (IP):</td> 
     <td>3150</td> 
     <td class="head">Riot Points (RP):</td> 
     <td>975</td> 
    </tr> 
    </tbody> 
</table> 

我想匹配第一个<td class...>和一个正则表达式如下一行:/

+0

尝试使用'\ r \ n'而不是'\ n' –

+1

您可以真正使用xpath来做到这一点。 –

+0

丹尼尔:试过它,但它没有工作:( 卡西米尔:从未使用xpath,所以我不知道它是什么:/ –

回答

1

说明

此正则表达式会发现td标签和两组归还。

<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>

enter image description here

摘要

  • <td\b[^>]*>找到的第一个td标签和使用任何属性
  • ([^<]*)捕获第一内文,这可以是贪婪的,但我们假设电池没有嵌套标签
  • <\/td>找到结束标记
  • [^<]*移过文本,直到你的所有的休息,这是假定有第一和第二td标签之间没有额外的标签
  • <td\b[^>]*>找到第二个TD踏歌而消耗任何属性
  • ([^<]*)拍摄第二张内部文本,这可能是贪婪的,但我们认为该单元没有嵌套标签
  • <\/td>找到结束标记

组0将获得整个字符串

  1. 将具有第一TD组
  2. 将具有第二TD组

VB.NET代码示例:

Imports System.Text.RegularExpressions 
Module Module1 
    Sub Main() 
    Dim sourcestring as String = "replace with your source string" 
    Dim re As Regex = New Regex("<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>",RegexOptions.IgnoreCase OR RegexOptions.Singleline) 
    Dim mc as MatchCollection = re.Matches(sourcestring) 
    Dim mIdx as Integer = 0 
    For each m as Match in mc 
     For groupIdx As Integer = 0 To m.Groups.Count - 1 
     Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value) 
     Next 
     mIdx=mIdx+1 
    Next 
    End Sub 
End Module 

$matches Array: 
(
    [0] => Array 
     (
      [0] => <td class="head">Health Points:</td> 
      <td>445 (+85/per level)</td> 
      [1] => <td class="head">Health Regen:</td> 
      <td>7.25</td> 
      [2] => <td class="head">Energy:</td> 
      <td>200</td> 
      [3] => <td class="head">Energy Regen:</td> 
      <td>50</td> 
      [4] => <td class="head">Damage:</td> 
      <td>53 (+3.2/per level)</td> 
      [5] => <td class="head">Attack Speed:</td> 
      <td>0.694 (+3.1/per level)</td> 
      [6] => <td class="head">Attack Range:</td> 
      <td>125</td> 
      [7] => <td class="head">Movement Speed:</td> 
      <td>325</td> 
      [8] => <td class="head">Armor:</td> 
      <td>16.5 (+3.5/per level)</td> 
      [9] => <td class="head">Magic Resistance:</td> 
      <td>30 (+1.25/per level)</td> 
      [10] => <td class="head">Influence Points (IP):</td> 
      <td>3150</td> 
      [11] => <td class="head">Riot Points (RP):</td> 
      <td>975</td> 
     ) 

    [1] => Array 
     (
      [0] => Health Points: 
      [1] => Health Regen: 
      [2] => Energy: 
      [3] => Energy Regen: 
      [4] => Damage: 
      [5] => Attack Speed: 
      [6] => Attack Range: 
      [7] => Movement Speed: 
      [8] => Armor: 
      [9] => Magic Resistance: 
      [10] => Influence Points (IP): 
      [11] => Riot Points (RP): 
     ) 

    [2] => Array 
     (
      [0] => 445 (+85/per level) 
      [1] => 7.25 
      [2] => 200 
      [3] => 50 
      [4] => 53 (+3.2/per level) 
      [5] => 0.694 (+3.1/per level) 
      [6] => 125 
      [7] => 325 
      [8] => 16.5 (+3.5/per level) 
      [9] => 30 (+1.25/per level) 
      [10] => 3150 
      [11] => 975 
     ) 

) 

免责声明

用正则表达式解析html是真的不是最好的解决方案,因为有很多边缘案例我们无法预测。但是,在这种情况下,如果输入字符串总是这样基本的,并且您愿意接受正则表达式不能100%运行的风险,那么这个解决方案可能适用于您。