在Scrapy蜘蛛中的xpath /正则表达式的问题

我试图从一个ul标签（id =“ShowProductImages”）的“之前 - 兄弟姐妹”中的onclick标签中提取产品ID。在Scrapy蜘蛛中的xpath /正则表达式的问题

我试图提取的数量后直接PID =，例如：？？

...列表/ ViewAll PID = &图像= 206 ...

下面是我试图提取的内容：

<ul id="ShowProductImages" class="imageView"> 
    <li><a href="" target="_blank" onClick="javascript:initWindow('http://products.example.com/products/list/ViewAll?pid=234565&amp;image=754550',520,520,100,220);return false;"><img src="http://content.example.com/assets/images/products/j458jk.jpg" width="200" height="150" alt="Product image description here" border="0"></a></li>   
</ul> 

<div class="description"> 
    Description here... 
</div>

我使用xpath选择onclick标记以及正则表达式以提取id。这是我正在使用的代码（这是不工作）

def parse(self, response): 
    sel = HtmlXPathSelector(response) 
    products_path = sel.xpath('//div[@class="description"]') 
    for product_path in products_path: 
    product = Product() 
    product['product_pid'] = product_path.xpath('preceding-sibling::ul[@id="ShowProductImages"][1]//li/a[1]/@onclick').re(r'(?:pid=)(.+?)(?:\'|$)') 
    yield product

有什么建议吗？我不太确定我出错的地方。

感谢您的帮助提前。

来源

2014-01-25 user2980769

我建议你试试这个，从ul选择，并且在谓词中测试其<div class="description">兄弟：

sel.xpath("""//ul[following-sibling::div[@class="description"]] 
       [@id="ShowProductImages"] 
       /li/a[1]/@onclick""").re(r'(?:pid=)(\d+)')

我改变了你的正则表达式来限制数字。

来源

2014-01-25 17:45:37

你可能也可以限制正则表达式为're（r'pid =（\ d +）'）'，非选择括号在这里是无用的...... – Robin

在Scrapy蜘蛛中的xpath /正则表达式的问题

回答

相关问题