2015-05-19 55 views
3

我已经写了一个蜘蛛抓取https://tecnoblog.net/categoria/review/但是当我让蜘蛛爬行,有一个错误:Scrapy错误:错误下载 - 无法打开CONNECT隧道

2015-05-19 15:13:20+0100 [scrapy] INFO: Scrapy 0.24.5 started (bot: reviews) 
2015-05-19 15:13:20+0100 [scrapy] INFO: Optional features available: ssl, http11 
2015-05-19 15:13:20+0100 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'reviews.spiders', 'SPIDER_MODULES': ['reviews.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'reviews'} 
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RotateUserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled item pipelines: 
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Spider opened 
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6030 
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Web service listening on 127.0.0.1:6087 
2015-05-19 15:13:25+0100 [tecnoblog] DEBUG: Redirecting (301) to <GET https://tecnoblog.net/categoria/review/> from <GET http://tecnoblog.net/categoria/review/> 
2015-05-19 15:13:26+0100 [tecnoblog] ERROR: Error downloading <GET https://tecnoblog.net/categoria/review/>: Could not open CONNECT tunnel. 
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Closing spider (finished) 
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Dumping Scrapy stats: 
    {'downloader/exception_count': 1, 
    'downloader/exception_type_count/scrapy.core.downloader.handlers.http11.TunnelError': 1, 
    'downloader/request_bytes': 644, 
    'downloader/request_count': 2, 
    'downloader/request_method_count/GET': 2, 
    'downloader/response_bytes': 501, 
    'downloader/response_count': 1, 
    'downloader/response_status_count/301': 1, 
    'finish_reason': 'finished', 
    'finish_time': datetime.datetime(2015, 5, 19, 14, 13, 26, 227904), 
    'log_count/DEBUG': 3, 
    'log_count/ERROR': 1, 
    'log_count/INFO': 7, 
    'scheduler/dequeued': 2, 
    'scheduler/dequeued/memory': 2, 
    'scheduler/enqueued': 2, 
    'scheduler/enqueued/memory': 2, 
    'start_time': datetime.datetime(2015, 5, 19, 14, 13, 20, 217735)} 
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Spider closed (finished) 

任何想法,为什么发生这种情况? 2015-05-19 15:13:26 + 0100 [tecnoblog]错误:下载https://tecnoblog.net/categoria/review/>时出错:无法打开CONNECT隧道。我在过去一个月内爬过的这个网站...如何修复它?我曾试图启动网址更改为“HTTP”,而不是“HTTPS”,但它重定向它:■

回答

2

你可能尝试通过HTTPS HTTP 代理进行连接。

您可以使用网上HTTPS代理测试仪来检查您的代理支持https或使用Linux curl command with proxy

curl -x http://111.222.333.444:80 -L https://myip.ht