2017-07-14 44 views
0

我想抓取优惠券网站的优惠券,但是当我 试图运行爬虫它显示error.Please帮助。 谢谢。scrapy爬虫在爬行时显示错误

import scrapy 
from scrapy.http import Request 
from scrapy.selector import HtmlXPathSelector 
from scrapy.spider import BaseSpider 
class CuponationSpider(scrapy.spider): 
    name = "cupo" 
    allowed_domains = ["cuponation.in"] 
    start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"] 
    def parse(self, response): 
     all_items = [] 
     divs_action = response.xpath('//div[@class="action"]') 
     for div_action in divs_action: 
     item = VoucherItem() 
     span0 = div_action.xpath('./span[@data-voucher-id]')[0] 
     item['voucher_id'] = span0.xpath('./@data-voucher- 
        id').extract()[0] 
     item['code'] = span0.xpath('./span[@class="code- 
       field"]/text()').extract()[0] 
     all_items.append(item) 





    >**Output** ERROR 
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open 
raise URLError(err)URLError: <urlopen error timed out> 
2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving 
up 
+1

回答你的问题是在警告。不要使用scrapy.selector.HtmlXPathSelector使用scrapy.Selector – Neil

+0

@Neil仍然没有解决的问题我也试过。 – abhi09sep

+0

那现在的警告是什么?什么是错误? – Neil

回答

0

Comment: ... tell me the error where i am doing

  1. 删除所有import线,使用只有之一:

    import scrapy 
    
  2. 好继承应该是:

    ​​
  3. 您已经更改namestarturl,使用:

    name = "cuponation" 
    allowed_domains = ['cuponation.in'] 
    start_urls = ['https://www.cuponation.in/firstcry-coupon'] 
    
  4. 您可以使用Python的2.7
    抱歉无法2.7运行Scrapy。这可能是不同之处。
    The 错误:无法读取实例数据,给出,告诉您没有收到来自给定URL的任何数据。也许你是黑名单。

Comment: URL is cuponation.in/firstcry-coupon#voucher

这是相同页面无需重新加载它。
所有可以简化为以下几点:

all_items = [] 

def parse(self, response): 
    # Get all DIV with class="action" 
    divs_action = response.xpath('//div[@class="action"]') 

    for div_action in divs_action: 
     item = VoucherItem() 

     # Get SPAN from DIV with Attribute data-voucher-id 
     span0 = div_action.xpath('./span[@data-voucher-id]')[0] 

     # Copy Attribute voucher_id 
     item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0] 

     # Find SPAN class="code-field" inside span0 and copy Text 
     item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0] 

     all_items.append(item) 

Output:

#CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon 
#CouponSpider.parse() 
#CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c> 
{'voucher_id': '868600', 'code': '*******'} 
{'voucher_id': '31793', 'code': '*******'} 
{'voucher_id': '832408', 'code': '*******'} 
{'voucher_id': '819903', 'code': '*******'} 
{'voucher_id': '808774', 'code': '*******'} 
{'voucher_id': '32274', 'code': '*******'} 
{'voucher_id': '32102', 'code': '*******'} 
{'voucher_id': '844247', 'code': '*******'} 
{'voucher_id': '843513', 'code': '*******'} 
{'voucher_id': '848151', 'code': '*******'} 
{'voucher_id': '845248', 'code': '*******'} 
{'voucher_id': '869101', 'code': '*******'} 
{'voucher_id': '869328', 'code': '*******'}    
+0

仍然没有工作 – abhi09sep

+0

@stovfl -----我上传了我的所有代码,但仍然面临问题。 – abhi09sep

+0

我已经做了所有的改变,但我仍然无法获得结果。 – abhi09sep