2010-11-04 58 views
3

我使用Python函数urllib2.urlopen来阅读http://www.bad.org.uk/网站,但即使访问网站时它仍然会收到302错误,但它仍然可以正常加载。任何人有任何想法为什么?Python urllib2.urlopen即使页面存在也返回302错误

import socket 

headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' } 

socket.setdefaulttimeout(10) 

try: 
    req = urllib2.Request('http://www.bad.org.uk/', None, headers) 
    urllib2.urlopen(req) 
    return True   # URL Exist 
except ValueError, ex: 
    print 'URL: %s not well formatted' % 'http://www.bad.org.uk/' 
    return False  # URL not well formatted 
except urllib2.HTTPError, ex: 
    print 'The server couldn\'t fulfill the request for %s.' % 'http://www.bad.org.uk/' 
    print 'Error code: ', ex.code 
    return False 
except urllib2.URLError, ex: 
    print 'We failed to reach a server for %s.' % 'http://www.bad.org.uk/' 
    print 'Reason: ', ex.reason 
    return False  # URL don't seem to be alive 

错误印刷:

The server couldn't fulfill the request for http://www.bad.org.uk//site/1/default.aspx. 
Error code: 302 

回答

18

当cookie被禁用时,http://www.bad.org.uk/的页面被破坏。

http://www.bad.org.uk/回报:

HTTP/1.1 302 Found 
Location: http://www.bad.org.uk/DesktopDefault.aspx 
Set-Cookie: Esperantus_Language_bad=en-GB; path=/ 
Set-Cookie: Esperantus_Language_rainbow=en-GB; path=/ 
Set-Cookie: PortalAlias=rainbow; path=/ 
Set-Cookie: refreshed=true; expires=Thu, 04-Nov-2010 16:21:23 GMT; path=/ 
Set-Cookie: .ASPXAUTH=; expires=Mon, 11-Oct-1999 23:00:00 GMT; path=/; HttpOnly 
Set-Cookie: portalroles=; expires=Mon, 11-Oct-1999 23:00:00 GMT; path=/ 

如果我再请求http://www.bad.org.uk/DesktopDefault.aspx没有设置这些cookie,它给出了另一个302和重定向到自身。

urllib2正在忽略cookie并发送没有cookie的新请求,因此它会在该URL处导致重定向循环。要处理此问题,您需要添加Cookie处理程序:

import urllib2 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) 
response = opener.open('http://www.bad.org.uk') 
print response.read() 
4

码302是一个临时重定向,所以你应该从响应的位置字段中得到URI和要求。

+0

我该怎么做?对不起,我对Python很陌生,在 – John 2010-11-04 16:19:35

+0

之前没有使用过urllib2 @John - 这是另外一个问题! – 2010-11-04 16:27:17

+2

302s默认由urllib2自动处理。 – 2010-11-04 16:32:27

相关问题