2017-05-16 43 views
1

我试图从网站上刮取拍卖数据(https://www.rbauction.com/heavy-equipment-auctions)。我目前的尝试是使用下面的代码将网站的HTML拉入VBA,然后通过它打包并保留我想要的项目(拍卖名称,天数,项目数)。VBA - HTML抓取问题

Sub RBA_Auction_Scrape() 

Dim S_Sheet As Worksheet: Set S_Sheet = ActiveWorkbook.ActiveSheet 
Dim Look_String As String 

On Error GoTo ERR_LABEL: 

Dim Web_HTML As String 
Dim HTTP_OBJ As New MSXML2.XMLHTTP60 

    Web_HTML = "" 
    HTTP_OBJ.Open "GET", "https://www.rbauction.com/heavy-equipment auctions", False 
    HTTP_OBJ.Send 

On Error Resume Next 

Select Case HTTP_OBJ.Status 
    Case 0: Web_HTML = HTTP_OBJ.responseText 
    Case 200: Web_HTML = HTTP_OBJ.responseText 
    Case Else: GoTo ERR_LABEL: 
End Select 

Debug.Print (Web_HTML) 

它成功地拉中的数据,但具有所有名称和拍卖的大小的“即将到来的重型设备拍卖”部分不被拉入VBA。我对HTML一般不太擅长,但我希望有人能够提供一个解决方案,或者至少可以解释一下当我通过网站搜索到的VBA中的HTML,找不到我想要的文章。

请帮忙!!!

回答

0

该网页源码HTML提供的链接提供https://www.rbauction.com/heavy-equipment-auctions不包含必要的数据,它使用AJAX。网站https://www.rbauction.com有一个可用的API。响应以JSON格式返回。导航页面e。 G。在Chrome中,然后打开开发工具窗口(F12),网络选项卡,重新加载(F5)页面并检查记录的XHR。最相关的数据是由URL https://www.rbauction.com/rba-api/calendar/v1?e1=true返回JSON字符串:

XHR-previev

XHR-headers

您可以使用下面的VBA代码的上述检索信息。 JSON.bas模块导入JSON处理的VBA项目。

Option Explicit 

Sub Test_www_rbauction_com() 

    Const Transposed = False ' Output option 

    Dim sResponse As String 
    Dim vJSON 
    Dim sState As String 
    Dim i As Long 
    Dim aRows() 
    Dim aHeader() 

    ' Retrieve JSON data 
    XmlHttpRequest "GET", "https://www.rbauction.com/rba-api/calendar/v1?e1=true", "", "", "", sResponse 
    ' Parse JSON response 
    JSON.Parse sResponse, vJSON, sState 
    If sState <> "Object" Then 
     MsgBox "Invalid JSON response" 
     Exit Sub 
    End If 
    ' Pick core data 
    vJSON = vJSON("auctions") 
    ' Extract selected properties for each item 
    For i = 0 To UBound(vJSON) 
     Set vJSON(i) = ExtractKeys(vJSON(i), Array("eventId", "name", "date", "itemCount")) 
     DoEvents 
    Next 
    ' Convert JSON structure to 2-d arrays for output 
    JSON.ToArray vJSON, aRows, aHeader 
    ' Output 
    With ThisWorkbook.Sheets(1) 
     .Cells.Delete 
     If Transposed Then 
      Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader) 
      Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows) 
     Else 
      OutputArray .Cells(1, 1), aHeader 
      Output2DArray .Cells(2, 1), aRows 
     End If 
     .Columns.AutoFit 
    End With 
    MsgBox "Completed" 

End Sub 

Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String) 

    Dim arrHeader 

    'With CreateObject("Msxml2.ServerXMLHTTP") 
    ' .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS 
    With CreateObject("MSXML2.XMLHTTP") 
     .Open sMethod, sUrl, False 
     If IsArray(arrSetHeaders) Then 
      For Each arrHeader In arrSetHeaders 
       .SetRequestHeader arrHeader(0), arrHeader(1) 
      Next 
     End If 
     .send sFormData 
     sRespHeaders = .GetAllResponseHeaders 
     sContent = .responseText 
    End With 

End Sub 

Function ExtractKeys(oSource, aKeys, Optional oDest = Nothing) As Object 

    Dim vKey 

    If oDest Is Nothing Then Set oDest = CreateObject("Scripting.Dictionary") 
    For Each vKey In aKeys 
     If oSource.Exists(vKey) Then 
      If IsObject(oSource(vKey)) Then 
       Set oDest(vKey) = oSource(vKey) 
      Else 
       oDest(vKey) = oSource(vKey) 
      End If 
     End If 
    Next 
    Set ExtractKeys = oDest 

End Function 

Sub OutputArray(oDstRng As Range, aCells As Variant) 

    With oDstRng 
     .Parent.Select 
     With .Resize(1, UBound(aCells) - LBound(aCells) + 1) 
      .NumberFormat = "@" 
      .Value = aCells 
     End With 
    End With 

End Sub 

Sub Output2DArray(oDstRng As Range, aCells As Variant) 

    With oDstRng 
     .Parent.Select 
     With .Resize(_ 
       UBound(aCells, 1) - LBound(aCells, 1) + 1, _ 
       UBound(aCells, 2) - LBound(aCells, 2) + 1) 
      .NumberFormat = "@" 
      .Value = aCells 
     End With 
    End With 

End Sub 

对我来说,输出如下:

output

顺便说一句,在下面的答案适用相同的方法:1234567