2013-07-16 73 views
0

我想仅提取HTML表格中行中最右侧单元格的内部文本。这是HTML代码的一小部分。的行中包含810个细胞和TR标签保持811个TD标签:从一个单元格中提取innerText

</tr><tr align="center" id="spt_inner_row_2"><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white"> 
&nbsp;300 - 305&nbsp; 
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white"> 
&nbsp;300 - 305&nbsp; 
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white"> 
&nbsp;300 - 305&nbsp; 
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white"> 
&nbsp;300 - 305&nbsp; 

我目前使用成功地提取从每个单元中的活性片的列A中的数据并将其粘贴的代码:

Sub GetData() 

    Dim URL As String 
    Dim IE As InternetExplorer 
    Dim HTMLdoc As HTMLDocument 
    Dim TDelements As IHTMLElementCollection 
    Dim TDelement As HTMLTableCell 
    Dim r As Long 

    'For login use 
    Dim LoginForm As HTMLFormElement 
    Dim UserNameInputBox As HTMLInputElement 
    Dim PasswordInputBox As HTMLInputElement 

    URL = "https://www.whatever.com" 

    Set IE = New InternetExplorer 

    With IE 
     .navigate URL 
     .Visible = True 

     'Wait for page to load 
     While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend 

     Set HTMLdoc = .document 

      'Enter login info 
      Set LoginForm = HTMLdoc.forms(0) 

      'Username 
      Set UserNameInputBox = LoginForm.elements("username") 
      UserNameInputBox.Value = "username" 

      'Password 
      Set PasswordInputBox = LoginForm.elements("password") 
      PasswordInputBox.Value = "password" 

      'Get the form input button and click it 

      Set SignInButton = LoginForm.elements("doLogin") 
      SignInButton.Click 

      'Wait for the new page to load 

      Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy: DoEvents: Loop 

     'Auto-navigate to start page, so we need to navigate once more 

     .navigate URL 

     Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy: DoEvents: Loop 

     End With 


    'Specify how to recognize data to extract 
    Set TDelements = HTMLdoc.getElementById("spt_inner_row_2").getElementsByTagName("TD") 


    r = 0 

    For Each TDelement In TDelements 

     ActiveSheet.Range("A1").Offset(r, 0).Value = TDelement.innerText 

     r = r + 1 

    Next 

End Sub 

我真正需要的只是提取HTML表格行中的最后一个(最右边)单元格。有什么建议么?

+0

请参阅本[**链接1 **](http://stackoverflow.com/questions/17643483/trying-从网页获取数据从一个VBA代码但有时它工作等等/ 17666816#17666816),[** Link2 **](http://support.microsoft.com/kb/17666816/)/questions/15844342/pull-upside-downside-capture-ratio-from-morningstar-com/15853293#15853293)&[** Link3 **](http://stackoverflow.com/questions/15959008/import-web-数据在-Excel的使用-VBA/15962055#15962055) – Santosh

回答

0

IHTMLElementCollection有一个length财产和item财产。该item财产可以采取一个数字指标,而是从零开始,所以最后一个条目是在length - 1

Dim TDelements As IHTMLElementCollection 

Set TDelements = HTMLdoc.getElementById("spt_inner_row_2").getElementsByTagName("TD") 

With TDelements 
    MsgBox .Item(.Length - 1).InnerText 
End With 
相关问题