为什么我scrapy总是告诉我 “TCP连接超时”

DEBUG: Retrying 
(failed 2 times): TCP connection timed out: 110: Connection timed out.

PS：系统是Ubuntu的，我能成功地做到这一点：为什么我scrapy总是告诉我 “TCP连接超时”

wget的http://www.dmoz.org/Computers/Programming/Languages/Python/Book/

蜘蛛代码：

#!/usr/bin/python 

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

class DmozSpider(BaseSpider): 
    name = "dmoz" 
    allowed_domains = ["dmoz.org"] 
    start_urls = ["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     sites = hxs.select('//ul/li') 
     for site in sites: 
      title = site.select('a/text()').extract() 
      link = site.select('a/@href').extract() 
      desc = site.select('text()').extract() 
      print title, link, desc

来源

2013-08-23 gangzi

您可以发布您的蜘蛛的代码，scrapy设置和控制台输出？ –

你可以发布你的设置吗？ –

您发布的代码是真实蜘蛛代码的摘录吗？你的'start_urls'有第二个URL被剥离，或者你有一个语法错误。尝试'start_urls = [“http://www.dmoz.org/Computers/Programming/Languages/Python/Books/”]' –

您的网络中存在问题或端口被阻塞。

还检查您的设置配置错误。

来源

2013-08-23 11:26:49

我的系统是ubuntu，它的端口是打开的default.So，现在我真的不知道“TCP连接超时：110：连接超时。” – gangzi

我在Windows 7上。在我的情况下，你是对的：我在另一个网络上启动并运行。 –

你有额外的"使用语法错误：

start_urls=["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"‌]

来源

2013-08-23 16:17:51 Tushar

为什么我scrapy总是告诉我 “TCP连接超时”

回答

相关问题