2016-01-13 39 views
0

我想使用excel vba从网页中的标题类型中提取属性值。我想从webpage刮数据结构如下:Excel:从HTML标题查询属性

<div class="index-detail"> 
 
    <h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentIdentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contentType="web-page" contentTitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5> 
 
    <span class="return-value">917.08 </span> 
 
    <span class="daily-change down ">-0.1% ▼ </span> 
 
</div>

使用getElementsByClassNamegetElementsByTagName我已经提取的标题<h5>,但是当我打印的标题我的innerText得到DJSI Chile,但我想获得属性contentTitle的文本Dow Jones Sustainability™ Chile Index (CLP)

我该怎么做?

UPDATE

的代码我使用如下:

Sub myConSP() 
 
    
 
    ' Declare variables 
 
    Dim oHtmlSP As HTMLDocument 
 
    Dim tSPIndex As HTMLDivElement 
 
    Dim tSPIdx As HTMLDivElement 
 

 
    ' Load page inside HTMLDocument 
 
    Set oHtmlSP = New HTMLDocument 
 
    With CreateObject("WINHTTP.WinHTTPRequest.5.1") 
 
     .Open "GET", "http://www.espanol.spindices.com", False 
 
     .send 
 
     oHtmlSP.body.innerHTML = .responseText 
 
    End With 
 

 
    ' Get indices 
 
    Set tSPIndex = oHtmlSP.getElementById("all-indices-slider") 
 

 
    Set objTitleTag = tSPIndex.getElementsByClassName("index-detail")(0).getElementsByTagName("h5")(0) 
 
    MsgBox objTitleTag.getAttribute("contentTitle").innerText 
 

 
End Sub

+0

'objTitleTag.getAttribute(“contentTitle”)' –

+0

如何定义objTitleTag? – capm

+0

这就是你所谓的'innerText'。总是最好展示您的实际代码:更容易提出有关添加内容的建议。 –

回答

1

的属性附加到<a>,不<h5>(抱歉,是我的错在以上评论中):

Sub TT() 

    Dim html As String, d As New HTMLDocument, el 

    html = "<div class='index-detail'>" & _ 
    "<h5><a href='/indices/equity/dow-jones-sustainability-chile-index-clp' " & _ 
    "title='DJSI Chile' contentIdentifier='2e9cb165-0cbf-4070-a5ef-dc20bf6219ba' " & _ 
    "contentType = 'web-page' " & _ 
    "contentTitle='Dow Jones Sustainability™ Chile Index (CLP)'>DJSI Chile</a></h5> " & _ 
    "<span class='return-value'>917.08 </span> " & _ 
    "<span class='daily-change down '>-0.1% ? </span></div>" 

    d.body.innerHTML = html 

    Set el = d.getElementsByClassName("index-detail")(0).getElementsByTagName("a")(0) 

    Debug.Print el.getAttribute("contentTitle") 
     ' >>> Dow Jones Sustainability™ Chile Index (CLP) 


End Sub 
+0

我明白了,当我通过class'index-detail'提取元素,然后隔离标题'

'时,属性不属于标题,而是属于以< capm