2013-04-16 93 views
0

试图使用scrapy.Code一些HTML获取货币值提取的货币价值是使用scrapy/XPath的

links = hxs.select('//a[@class="product-image"]/div[@class="price-box"]//span[@class="price"]/text()').extract()') 

和HTML

<div> 
    <span> 
    <sub> 
     <li class="item first"> 

     <a href="http://www.xtra-vision.ie/dvd-blu-ray/to-rent/new-release/dvd/pitch-perfect-dvd.html" title="Image for Pitch Perfect" class="product-image"> 

      <span class="exclusive-star"> 
      </span> 

      <img src="http://www.xtra-vision.ie/media/catalog/product/cache/3/small_image/124x173/5b02ab93946615b958c913185aae2414/i/w/iws_5167c10c906b57.33524324.JPG.jpg" alt="Image for Pitch Perfect" /> 

      <h2 class="product-name">Pitch Perfect</h2> 

      <div class="price-box"> 

      <span class="regular-price" id="product-price-5174"> 

       <span class="price"> 
       €15      
       <sub class="price-bit">.99</sub> 
       </span> 
      </span> 
      </div> 
     </a> 
     </li> 
    </sub> 

    </span> 

</div> 

得到的价格我得到的是\ u20ac15 \ t \ t \ t \ t \ t \ t 有没有什么办法可以使用xpath从此html中提取15.99

回答

0

我使用了xpath和Python的组合,因此可能不太符合您的要求,althoug h这主要是为了摆脱添加到“价格”末尾的无关标签。

price = hxs.select('//span[@class="price"]/text()').extract() 
pricebit = hxs.select('//span[@class="price"]/sub[@class="price-bit"]/text()').extract() 
totalprice = price + price-bit 
totalstr = ''.join(totalprice).replace('\t','') 
+0

坦克Talvalin工作 – balcoder