2016-01-18 20 views
0

所以我尝试从多个网站抓取数据到excel。我认为代码在理论上运行良好,虽然我有“对象变量或块变量未设置错误”。网络抓取vba“对象变量或块变量未设置错误”和浏览器崩溃

我面对的第一个问题是,有时它会从2到10完美循环并插入每一位数据,但有时会出现错误,并看到只插入1或2行数据。我真的无法弄清楚现在可能是什么原因。

其次,这是某种演示代码。我只使用一小批数据并循环播放。我的真正目标是找到一种方法,可以在不使计算机或浏览器崩溃的情况下抓取多达100行的网页。如果我将我的代码转换为XMLHTTP类型的刮取,会更好吗?如果是这样,我该怎么做。

由于提前

Private Sub CommandButton1_Click() 

    Dim ie As Object 
    Dim iexp As Object 
    Dim firstname(1 To 10), lastname(1 To 10) As Variant 
    Dim mm(1 To 10), dd(1 To 10), yyyy(1 To 10) As Integer 
    Dim PhoneNumber(1 To 10) As Variant 
    Dim Address(1 To 10) As Variant 
    Dim HomeValue(1 To 10) As Variant 


    Dim i As Integer 



     For i = 2 To 10 

     'get variables from excel sheet1 and search on peoplefinders.com 

       firstname(i) = Sheet1.Cells(i, 1).Value 
       lastname(i) = Sheet1.Cells(i, 2).Value 
       mm(i) = Sheet1.Cells(i, 3).Value 
       dd(i) = Sheet1.Cells(i, 4).Value 
       yyyy(i) = Sheet1.Cells(i, 5).Value 

       Set ie = CreateObject("InternetExplorer.Application") 
       ie.Visible = True 
       ie.Height = 1000 
       ie.Width = 1000 
       ie.navigate ("http://www.peoplefinders.com/peoplesearch/searchresults?search=People&fn=" & firstname(i) & "&mn=&ln=" & lastname(i) & "&city=&state=&age=&dobmm=" & mm(i) & "&dobdd=" & dd(i) & "&doby=" & yyyy(i)) 

       Do While ie.Busy: DoEvents: Loop 
       Dim Doc As HTMLDocument 

       Set Doc = ie.document 

      'get elements and insert into cells in sheet 1 

       PhoneNumber(i) = Doc.getElementsByTagName("td")(2).getElementsByTagName("a")(0).innerText 
       Address(i) = Doc.getElementsByTagName("td")(1).getElementsByTagName("a")(0).innerText 

       Sheet1.Cells(i, 6).Value = PhoneNumber(i) 
       Sheet1.Cells(i, 7).Value = Address(i) 

      'modify address for next search 

       a = Split(Address(i), " ") 
       b = Join(a, "-") 

      'search home value on zillow.com 

       Set iexp = CreateObject("InternetExplorer.Application") 
       iexp.Visible = True 
       iexp.Height = 1000 
       iexp.Width = 1000 
       iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/") 

       Do While iexp.Busy: DoEvents: Loop 
       Dim Doc2 As HTMLDocument 

       Set Doc2 = iexp.document 

       iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/") 

      'insert home value into cells in sheet 1 

       HomeValue(i) = Doc2.getElementsByClassName("home-summary-row")(1).getElementsByTagName("span")(1).innerText 

       Sheet1.Cells(i, 8).Value = HomeValue(i) 



     Next 


    End Sub 
+0

检查就绪状态以及忙碌,并在每次浏览时间。你导航检查,然后再次导航,并没有得到文件第二次 –

+0

我相信你遇到的时机问题有时反应尚未完成。是的,使用XMLDOC或IE。我所做的是等待国家= 4;通过'lSize = Len(IE.Document.body.innerhtml)'检查响应的长度以查看是否低于好的回报;我搜索一个已知的值,如果没有找到,请等待一秒钟,然后重试。你没有提到错误发生在哪里? –

回答

0

正如凯雷姆图尔古特卢尖欧,检查是否忙通常是不够的,你必须要检查的readyState为好。下面是我如何做到这一点:

Sub WaitBrowser(browser As Object) 
    Do While browser.Busy 
      DoEvents 
    Loop 
    Do While browser.readyState <> 4 
      DoEvents 
    Loop 
End Sub 

我那么每个导航后调用WaitBrowser IE(其中IE是我InternetExplorer.Application对象),与文档元素插手之前。对于其他方法,为了提高效率和可预测性,我更愿意直接使用API​​发送HTTP消息(我通常使用WinHTTP,但也可能使用XMLHTTP或winInet,我相信)。两个值得注意的例外:1)我想在处理过程中或处理后将用户引导至浏览器,或者2)复杂的脚本涉及找出要发送的数据以获取所需的数据(在这种情况下,让浏览器更容易工作)。

这是改编自最近的一个项目为例(没有错误检查):

Function FindLink() As String 
    Dim Request as Object 
    Set Request = CreateObject("WinHttp.WinHttpRequest.5.1") 
    Request.Open "GET", "http://example.com/pagewithinfo" 
    Request.Send 
    Dim resp as String 
    resp = Request.ResponseText 
    'create html tree with response 
    Dim h As Object 
    Set h = CreateObject("htmlfile") 
    h.body.innerHTML = respA 
    'get the info 
    FindLink = h.DocumentElement.GetElementsByTagName("a")(0).GetAttribute("href") 
    Set h = Nothing 
    Set Request = Nothing 
End Function 
相关问题