如何使用python Scrapy惰性加载图像

这里是我用于爬取网页的代码。我想抓取的网站图片延迟加载启用，所以scrapy只能抓取100张图片中的10张，其余都是placeholder.jpg。在Scrapy中处理延迟加载图像的最佳方式是什么？如何使用python Scrapy惰性加载图像

谢谢！

class MasseffectSpider(scrapy.Spider): 
name = "massEffect" 
allowed_domains = ["amazon.com"] 
start_urls = [ 
    'file://127.0.0.1/home/ec2-user/scrapy/amazon/amazon.html', 
] 


def parse(self, response): 

for item in items: 
    listing = Item() 
    listing['image'] = item.css('div.product img::attr(src)').extract() 
    listing['url'] = item.css('div.item-name a::attr(href)').extract() 
    listings.append(listing)

看来像CasperJS这样的其他工具有加载图像的视口。

casper.start('http://m.facebook.com', function() { 

// The pretty HUGE viewport allows for roughly 1200 images. 
// If you need more you can either resize the viewport or scroll down the viewport to load more DOM (probably the best approach). 
this.viewport(2048,4096); 

this.fill('form#login_form', { 
    'email': login_username, 
    'pass': login_password 
}, true); 
});

来源

2016-04-30 Will W

你能分享你正在爬行的网站吗？在一个pastebin将工作。 – eLRuLL

问题是懒惰的加载是由JavaScript哪些scrapy无法处理，casperjs处理这个。

为了与scrapy这个工作，你必须将其与硒或scrapyjs

来源

2016-04-30 13:30:14

组合，以在延迟加载刮图片，你必须跟踪返回图像Ajax请求。在此之后，您在scrapy中点击该请求。从特定页面获取所有数据后。您必须通过元数据在scrapy请求中将提取的数据发送到其他回调。为进一步的帮助Scrapy request

来源

2016-05-02 13:19:41

如何使用python Scrapy惰性加载图像

回答

相关问题