0
VB.net路段提取我能提取与简单的HREF标记像这样的网址:与HtmlAgilityPack
<a href="http://www.samplesite.com">
但我的问题是我如何提取从href标记,看起来像这样的链接?
<a href="http://www.wherecreativitygoestoschool.com/vancouver/left_right/rb_test.htm" onmousedown="return rwt(this,'','','','1','AFQjCNHvlwTxfBVEYcqGUnilAZN0uY2IXw','','0CCsQFjAA','','',event)">
Right Brain vs Left Brain Creativity <em>Test</em> at The Art Institute of <b>...</b></a>
这里是我的完整代码:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim webClient As New System.Net.WebClient
Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA")
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(WebSource)
Dim links = GetLinks(doc, "test")
For Each Link In links
ListBox1.Items.Add(Link.ToString())
Next
End Sub
Public Class Link
Public Sub New(Uri As Uri, Text As String)
Me.Uri = Uri
Me.Text = Text
End Sub
Public Property Text As String
Public Property Uri As Uri
Public Overrides Function ToString() As String
Return String.Format(If(Uri Is Nothing, "", Uri.ToString()))
End Function
End Class
Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
Dim uri As Uri = Nothing
Dim linksOnPage = From link In doc.DocumentNode.Descendants()
Where link.Name = "a" _
AndAlso link.Attributes("href") IsNot Nothing _
Let text = link.InnerText.Trim()
Let url = link.Attributes("href").Value
Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
AndAlso uri.TryCreate(url, UriKind.Absolute, uri)
Dim Uris As New List(Of Link)()
For Each link In linksOnPage
Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
Next
Return Uris
End Function
我注意到,我的代码不会提取与</a>
结束链接。有什么我可以做的修改我的代码,它会提取以</a>
结尾的链接?