2017-01-07 138 views
1

我怎样才能凑所有来自网站与动态路由Scrapy抓取网站与动态路由

http://growthtools.io/social-media-automation-tools

当我试图

scrapy shell 'http://growthtools.io/social-media-automation-tools' 

我收到以下结果

2017-01-07 22:43:06 [root] DEBUG: Using default logger 
2017-01-07 22:43:06 [root] DEBUG: Using default logger 

In [1]: view(response) 

enter image description here

response物体没有包含tools元素。

In [3]: In [2]: response.css('.toolsList') 
Out[3]: [] 
In [5]: 'toolsList' in response.body 
Out[5]: False 

谁能描述我如何解析http://growthtools.io/social-media-automation-tools为什么reponse对象我以前不包含所有页面内容?

+0

该网站使用JavaScript来显示页面。你应该使用像Splash或PhantomJS这样的无头浏览器来渲染它。 –

回答

0

页面加载涉及由Scrapy不是的浏览器执行的JavaScript。你可以通过scrapy-splash来解决它,它提供了一个中间件在你的Scrapy项目中使用。中间件使用您可以通过泊坞窗运行的Splash JS rendering service

就在Scrapy Shell中测试它,您可以按照this example to run it from the shell

工作对我来说:

$ scrapy shell 'http://localhost:8050/render.html?url=http://growthtools.io/social-media-automation-tools' 
In [1]: response.css('.toolsList') 
Out[1]: 
[<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>]