我想抓我的雇主网站从他们的博客文章中提取图像大规模。我已经开始使用VBA在Excel中创建一个抓取工具。Excel VBA:从字符串中提取图像源属性作为字符串
(我们没有访问SQL数据库)
我已经安装包含交标识符在列A名单和后的B列的URL的工作表,
到目前为止,我的VBA脚本遍历列B中的URL列表,通过ID从页面上的标签中提取HTML,使用getElementById并将结果输出作为字符串粘贴到列C中。
我现在处于关键位置我正在试图找出如何从结果输出中的每个图像中提取src属性并将其粘贴到相关公司lumns。我不能为我的生活提出一个简单的解决方案。我对RegEx并不是很熟悉,我正在努力使用Excel内置的字符串函数。
的最后一步就是打通每个图像URL来运行和图像保存到磁盘中的文件名格式,如宏“{事件没有。} - {图片号码}”。JPG
任何帮助非常感谢。
Sub Get_Image_SRC()
Dim sht As Worksheet
Dim LastRow As Long
Dim i As Integer
Dim url As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set sht = ThisWorkbook.Worksheets("Sheet1")
'Ctrl + Shift + End
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
For i = 2 To LastRow
url = Cells(i, "C").Value
MsgBox (url)
IE.navigate url
Application.StatusBar = url & " is loading..."
Do While IE.readyState = 4: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
Application.StatusBar = url & " Loaded"
If Cells(i, "B").Value = "WEBNEWS" Then
Cells(i, "D").Value = IE.document.getElementById("NewsDetail").outerHTML
Else
Cells(i, "D").Value = IE.document.getElementById("ReviewContainer").outerHTML
End If
Next i
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
End Sub
实施例得到的HTML:
<div id=""NewsDetail""><div class=""NewsDetailTitle"">Video: Race Face Behind the Scenes Tour</div><div class=""NewsDetailImage""><img alt=""HeadlinesThumbnail.jpg"" src=""/ImageHandler/6190/515/1000/0/""></div> <div class=""NewsDetailBody"">Pinkbike posted this video a while ago, if you missed it, its' definitely worth a watch.
Ken from Camp of Champions took a look at their New Westminster factory last year which gives a look at the production, people and culture of Race Face. The staff at Race Face are truly their greatest asset they had, best wishes to everyone!
<p><center><object width=""500"" height=""281""><param name=""allowFullScreen"" value=""true""><param name=""AllowScriptAccess"" value=""always""><param name=""movie"" value=""http://www.pinkbike.com/v/188244""><embed width=""500"" height=""281"" src=""http://www.pinkbike.com/v/188244"" type=""application/x-shockwave-flash"" allowscriptaccess=""always"" allowfullscreen=""true""></object></center><p></p>
</div><div class=""NewsDate"">Published Friday, 25 November 2011</div></div>"
谢谢,罗宾。这对于具有单个图像的页面非常有效。我可以问你怎么去修改这个来获取多个图像? – user2866975
@ user2866975 - 查看我的编辑 - 基本上需要将Global标志设置为true,然后遍历所有匹配。 –