Scrapy无法抓取链接 - vnexpress网站的评论

我是Scrapy的新手& Python。我尝试从以下网址的评论，但结果总是空：http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html Scrapy无法抓取链接 - vnexpress网站的评论

这里是我的代码：

from scrapy.spiders import Spider 
from scrapy.selector import Selector 
from tutorial.items import TutorialItem 

import logging 

class TutorialSpider(Spider): 
    name = "vnexpress" 
    allowed_domains = ["vnexpress.net"] 
    start_urls = [ 
     "http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html" 
    ] 

    def parse(self, response): 
     sel = Selector(response) 
     commentList = sel.xpath('//div[@class="comment_item"]') 
     items = [] 
     id = 0; 

     logging.log(logging.INFO, "TOTAL COMMENT : " + str(len(commentList))) 

     for comment in commentList: 
      item = TutorialItem() 

      id = id + 1 

      item['id'] = id 
      item['mainId'] = 0 
      item['user'] = comment.xpath('//span[@class="left txt_666 txt_11"]/b').extract() 
      item['time'] = 'N/A' 
      item['content'] = comment.xpath('//p[@class="full_content"]').extract() 
      item['like'] = comment.xpath('//span[@class="txt_666 txt_11 right block_like_web"]/a[@class="txt_666 txt_11 total_like"]').extract() 

      items.append(item) 

     return items

感谢您阅读

来源

2016-05-12 Valentine Heartilly

貌似评论加载到页面一些JavaScript代码。

Scrapy不会在页面上执行JavaScript，它只会下载HTML页面。尝试在浏览器中禁用JavaScript的情况下打开页面，并且您应该看到Scrapy看到的页面。

你有一些选项：

反向工程的意见是如何加载到页面，使用浏览器的开发者工具面板，在“网络”选项卡（也可能是一些XHR调用加载HTML或JSON数据）
使用（无头）浏览器呈现页面（硒，casper.js，splash ...）;
- 例如您可能需要使用Splash（网页抓取的JavaScript渲染选项之一）来尝试此页面。这是你从闪回HTML（它包含注释）：http://pastebin.com/njgCsM9w

来源

2016-05-12 11:00:20

感谢您的帮助。我会尝试。 –

Scrapy无法抓取链接 - vnexpress网站的评论

回答

相关问题