2016-02-02 118 views
0

如何与LXML工作挺纳闷......我一般使用正则表达式,因为我可以提取一次的所有数据,但我不知道如何与LXML解析这些值:从HTML解析多个值与LXML

data = tree.xpath('//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]') 
# extract data from div class: featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2 

"M4A4 | Poseidon " + "Factory New" 
"9462141" 
"195.00" 
"https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f" 

"Chroma 2 Case Key" 
"9462120" 
"2.11" 
"https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f" 

的HTML代码,我需要从解析:

<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"> 
    <div> 
     <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=M4A4+%7C+Poseidon+%28Factory+New%29" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462141"> 
       M4A4 | Poseidon 
      </a> 
     <div class="item-desc"> 
      <small class="text-muted">Factory New</small> 
      <small style="color:#777777">Classified Rifle</small> 
      <small class="item-warning"></small> 
     </div> 
     <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f"> 
     <div class="item-add"> 
      <div class="item-amount">$195.00</div> 
      <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/115787731/" target="_BLANK">Suggested Price: $258.52</a> 
      </div> 
      <div class="item-buttons text-center"><a href="steam://rungame/730/76561202255233023/+csgo_econ_action_preview%20S76561198236464786A5000169384D16322433520890898502" class="btn btn-primary" style="margin-right:4px">Inspect</a> 
       <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462141)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span> 
      </div> 
     </div> 
    </div> 
</div> 

<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"> 
    <div> 
     <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=Chroma+2+Case+Key" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462120"> 
       Chroma 2 Case Key 
      </a> 
     <div class="item-desc"> 
      <small class="text-muted"></small> 
      <small style="color:#777777">Base Grade Key</small> 
      <small class="item-warning"></small> 
     </div> 
     <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f"> 
     <div class="item-add"> 
      <div class="item-amount">$2.11</div> 
      <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/100994798/" target="_BLANK">Suggested Price: $2.70</a> 
      </div> 
      <div class="item-buttons text-center"> 
       <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462120)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span> 
      </div> 
     </div> 
    </div> 
</div> 

PS:我需要为循环做对的'//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]' 或LXML每个实例每一个数据中提取作为一个列表?

+0

'lxml'可以提取它的列表,然后你可以使用'for'循环做一些对列表中的每个元素 - 为例子来提取子元素。 – furas

回答

1

xpath返回实例列表,您必须使用for循环才能从instaces中获取子元素。

实例代码如下data

data ='''<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"> 
    <div> 
     <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=M4A4+%7C+Poseidon+%28Factory+New%29" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462141"> 
       M4A4 | Poseidon 
      </a> 
     <div class="item-desc"> 
      <small class="text-muted">Factory New</small> 
      <small style="color:#777777">Classified Rifle</small> 
      <small class="item-warning"></small> 
     </div> 
     <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f"> 
     <div class="item-add"> 
      <div class="item-amount">$195.00</div> 
      <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/115787731/" target="_BLANK">Suggested Price: $258.52</a> 
      </div> 
      <div class="item-buttons text-center"><a href="steam://rungame/730/76561202255233023/+csgo_econ_action_preview%20S76561198236464786A5000169384D16322433520890898502" class="btn btn-primary" style="margin-right:4px">Inspect</a> 
       <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462141)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span> 
      </div> 
     </div> 
    </div> 
</div> 

<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"> 
    <div> 
     <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=Chroma+2+Case+Key" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462120"> 
       Chroma 2 Case Key 
      </a> 
     <div class="item-desc"> 
      <small class="text-muted"></small> 
      <small style="color:#777777">Base Grade Key</small> 
      <small class="item-warning"></small> 
     </div> 
     <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f"> 
     <div class="item-add"> 
      <div class="item-amount">$2.11</div> 
      <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/100994798/" target="_BLANK">Suggested Price: $2.70</a> 
      </div> 
      <div class="item-buttons text-center"> 
       <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462120)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span> 
      </div> 
     </div> 
    </div> 
</div>''' 

import lxml, lxml.html 

html = lxml.html.fromstring(data) 

divs = html.xpath('//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]') 

for x in divs: 
    a = x.xpath('.//a/text()')[0] 
    print a.strip() 

    small = x.xpath('.//small[@class="text-muted"]/text()') 
    if small: 
     print small[0] 

    div = x.xpath('.//div[@class="item-amount"]/text()')[0] 
    print div 

    a_href = x.xpath('.//a/@href') 
    item = a_href[1].split('=')[-1] 
    print item 

    img = x.xpath('.//img[@class="item-img"]/@src')[0] 
    print img 

-

M4A4 | Poseidon 
Factory New 
$195.00 
9462141 
https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f 
Chroma 2 Case Key 
$2.11 
9462120 
https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f 
+0

完美!工作,现在我必须更好地理解你是如何做的一切:D –