2017-04-06 136 views
3

我想从使用BeautifulSoup的HTML中提取一些数据。我只想返回data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" *`,但我没有得到任何结果。我正在使用下面的代码。任何帮助,将不胜感激。使用BeautifulSoup从tbody提取数据

parsed = soup.find_all('tbody', class=re.compile('^data-')) 
<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0"> 
<tr class="first-line"> 
    <td class="icon-td"> 
    <div class="icon"> 
    <img alt="Item icon" src="https://web.poecdn.com/image/Art/2DItems/Maps/AtlasMaps/SulphurWastes3.png?scale=1&amp;w=1&amp;h=1&amp;v=48802019c4a2e88af038d75ec1e4b31e3"/> 
    \n 
    <div class="sockets" style="position: absolute;"> 
    \n 
    <div class="sockets-inner" style="position: relative; width:94px;"> 
     \n 
    </div> 
    \n 
    </div> 
    </div> 
    </td> 
    <td class="item-cell"> 
    <h5> 
    <a class="title itemframe0" href="#" onclick="return false;" target="_blank"> 
    Sulphur Wastes Map 
    </a> 
    <span class="found-time-ago"> 
    2 months ago 
    </span> 
    </h5> 
    <ul class="requirements proplist"> 
    <li> 
    <span class="sortable" data-name="ilvl"> 
     ilvl: 80 
    </span> 
    </li> 
    </ul> 
    <span class="sockets-raw" style="display:none"> 
    </span> 
    <ul class="item-mods"> 
    </ul> 
    </td> 
    <td class="table-stats"> 
    <table> 
    <tr class="calibrate"> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    </tr> 
    <tr class="cell-first"> 
    <th class="disabled" colspan="2"> 
     Quality 
    </th> 
    <th class="disabled" colspan="2"> 
     Phys. 
    </th> 
    <th class="disabled" colspan="2"> 
     Elem. 
    </th> 
    <th class="disabled" colspan="2"> 
     APS 
    </th> 
    <th class="disabled" colspan="2"> 
     DPS 
    </th> 
    <th class="disabled" colspan="2"> 
     pDPS 
    </th> 
    <th class="disabled" colspan="2"> 
     eDPS 
    </th> 
    </tr> 
    <tr class="cell-first"> 
    <td class="sortable property " colspan="2" data-name="q" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="pd" data-value="0.0"> 
    </td> 
    <td class="sortable property " colspan="2" data-ed="" data-name="ed" data-value="0.0"> 
    </td> 
    <td class="sortable property " colspan="2" data-name="aps" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="dps" data-value="0.0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="pdps" data-value="0.0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="edps" data-value="0.0"> 
     \xa0 
    </td> 
    </tr> 
    <tr class="cell-second"> 
    <th class="cell-empty"> 
    </th> 
    <th class="disabled" colspan="2"> 
     Armour 
    </th> 
    <th class="disabled" colspan="2"> 
     Evasion 
    </th> 
    <th class="disabled" colspan="2"> 
     Shield 
    </th> 
    <th class="disabled" colspan="2"> 
     Block 
    </th> 
    <th class="disabled" colspan="2"> 
     Crit. 
    </th> 
    <th colspan="2"> 
     Tier 
    </th> 
    </tr> 
    <tr class="cell-second"> 
    <td class="cell-empty"> 
    </td> 
    <td class="sortable property " colspan="2" data-name="armour" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="evasion" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="shield" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="block" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="crit" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="level" data-value="13"> 
     13 
    </td> 
    </tr> 
    </table> 

回答

0

您正在尝试在标记类中查找标记属性,这不起作用。

为什么找不到ID?只要确保它包含前0

​​
0

那么你不能真正做到这一点,你可以提取这样的标签的具体信息。

定义您发布的内容,比如x = HTML:x = '''<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">'''

soup = BeautifulSoup(x,'lxml') 

this_class = soup.findAll('tbody',{'class':'item item-live-c324ceb98e25716a0fad0727e0cd64e3'}) 
#This is used to pinpoint the exact tbody (you can do it your way), 
# but it's useful because you give it the exacty key-value. (Mostly can't miss) 

for i in this_class: 
    print(i['data-buyout']) 
    print(i['data-ign']) 
    print(i['data-name']) 
    print(i['id']) 

可以打印这些属性的每一个值,但如果你使用soup.findAllsou.find只付印( 一个分支但也是整个(儿童)

0

下解决了我的问题的combonation的子

parsed = soup.select("tbody[id*=item-container-]") 
for i in parsed: 
    print(i['data-buyout']) 
    print(i['data-ign']) 
    print(i['data-name']) 
    print(i['id']) 
相关问题