2014-01-30 62 views
0

我试图搜索的例子和很多,但似乎没有任何工作。 所以我使用HtmlAgilityPack,我想获取两个特定标签之间的内部文本。获取两个标签之间的内部文本,并获得输出到两个标签 - VB.NET - HtmlAgilityPack

实施例:

<br>Terms of Service<br></br>Developers<br> 

我想获得的innerText其中第一<br><br>到label1和第二</br><br>成LABEL2

这将是象

Label1.text = “服务条款”
Label2.text =“Developers”

我如何实现/做/得到这个? P.s;我对HtmlAgilityPack并不熟悉,显示如何做到这一点的代码会更好。 :-)

谢谢

+1

尝试选择节点'“// br”',循环遍历结果节点,并将标签设置为'InnerText'。 –

+0

我还是使用htmagilitypack的初学者,你能不能在代码中告诉我你在说什么? :) – Jeff

+0

即时通讯不知道什么是'
'是。请问为什么你的换行符有结束标签?无论如何,我可能会得到答案。支持... – 2014-01-31 01:59:51

回答

0

这是有点脏,但应该工作。

Imports System.Text.RegularExpressions 

    Dim mystring As String = "<br>Terms of Service<br></br>Developers<br>" 

    Dim pattern1 As String = "(?<=<br>)(.*?)(?=<br>)" 
    Dim pattern2 As String = "(?<=</br>)(.*)(?=<br>)" 

    Dim m1 As MatchCollection = Regex.Matches(mystring, pattern1) 
    Dim m2 As MatchCollection = Regex.Matches(mystring, pattern2) 
    MsgBox(m1(0).ToString) 
    MsgBox(m2(0).ToString) 
+0

谢谢你做到了这一点,并完美地工作! :) 然而,我真正想要的是如何用HtmlAgilityPack做到这一点,如果你或任何人都可以给我一个关于如何用HtmlAgilityPack做到这一点的示例代码,那将是非常好的,我认为这将有利于其他正在研究类似的问题:) – Jeff

+0

你有我好奇,如果它甚至可能与HAP,所以生病看看。通常我会尝试通过标记名称或ID属性获取内联网,但是您的情况中的标记缺乏一致性,这就是为什么我使用了两种不同的正则表达式模式。 – 2014-02-01 06:46:30

+0

我想在第二个答案中添加我对此的回复,以便您可以看到代码和笔记。 – 2014-02-01 07:34:21

0

简而言之,HAP并不适合完成您的任务。我的笔记如下:

Imports HtmlAgilityPack 

Public Class Form1 
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click 
     Dim mystring As String = "<BR>Terms of Service<BR></BR>Developers<BR>" 
     Dim myDoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument 
     myDoc.LoadHtml(mystring) 
     ' here we notice HAP immediately discards the junk tag </br> 
     MsgBox(myDoc.DocumentNode.OuterHtml) 

     ' Below we notice that HAP did not close the BR tag because it only 
     ' attempts to close 
     ' certain nested tags associated with tables (th, tr, td) and lists 
     ' (li). 
     ' if this was a supported tag that HAP could fix, the fixed output 
     ' would be as follows: 
     ' <br>Terms of Service<br></br>Developers<br></br></br> 
     ' this string would be parsed as if the last tag closes the first 
     ' and each set of 
     ' inner tags close themselves without any text between them. 
     ' This means even if you changed BR to TD, or some other tag HAP 
     ' fixes nesting on, it 
     ' still would not help to parse this correctly. 
     ' Also HAP does not appear to support XHTML in this .net 2.0 version. 

     myDoc.OptionFixNestedTags = True 
     MsgBox(myDoc.DocumentNode.OuterHtml) 

     ' here we put the BR tag into a collection. as it iterates through 
     ' the tags we notice there is no inner text on the BR tag, presumably 
     ' because of two reasons. 
     ' 1. HAP will not close a BR. 
     ' 2. It does not fix your broken nested tags as you expect or required. 

     Dim myBR As HtmlNodeCollection = myDoc.DocumentNode.SelectNodes("//BR") 
     If Not myBR Is Nothing Then 
      For Each br In myBR 
       MsgBox(br.InnerText) 
      Next 
     End If 
    End Sub 

End Class 
+0

Thankyou解释这一点,所以我明白,我需要一个干净的代码来使用htmlagilitypack。谢谢你让我明白这一点。 :) – Jeff

+0

很多情况下HAP都会很好地清理HTML。例如,当使用itextsharp将HTML转换为PDF时,我使用它来清除标签中的属性。然而,你的情况是让我们说...独特。 – 2014-02-03 07:22:05

相关问题