HTTP错误403：禁止的urlib2 Python 2.7

我已经成功地使用了urllib2，但是对于这个网站，我突然测试了它并没有工作。我在论坛上看过，并尝试了一些修复程序，它似乎并没有工作。下面是一个解决方案的例子，但不适合我。有人可以帮助我连接到它。HTTP错误403：禁止的urlib2 Python 2.7

，让错误的代码：

from bs4 import BeautifulSoup 
import urllib2 

proxy_support = urllib2.ProxyHandler({"http":"http://username:[email protected]:port"}) 
hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'} 
url = 'http://www.carnextdoor.com.au/' 
opener = urllib2.build_opener(proxy_support) 
urllib2.install_opener(opener) 
req=urllib2.Request(url,headers=hdr) 
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read() 
html = urllib2.urlopen(req).read() 
soup=BeautifulSoup(html,"html5lib") 
print soup

来源

2016-03-07 FancyDolphin

您有可能在网站上被阻止 – YOU

？ – FancyDolphin

根据回答，这是网站 – YOU

我得到了403，直到我添加了一个用户代理，下面就足以为我工作：

hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"} 
url = 'http://www.carnextdoor.com.au/' 


req=urllib2.Request(url,headers=hdr) 
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read() 
html = urllib2.urlopen(req).read() 
soup=BeautifulSoup(html,"html5lib") 
print soup

不包含用户代理：

In [10]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'} 

In [11]: url = 'http://www.carnextdoor.com.au/' 

In [12]: req=urllib2.Request(url,headers=hdr) 

In [13]: html = urllib2.urlopen(req).read() 
--------------------------------------------------------------------------- 
HTTPError         Traceback (most recent call last) 
<ipython-input-13-dbeb64d95cd3> in <module>() 
----> 1 html = urllib2.urlopen(req).read()

在User-Agent：

In [20]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"} 

In [21]: req=urllib2.Request(url,headers=hdr) 
In [22]: html = urllib2.urlopen(req).read() 
In [23]:

使用requests没有任何用户代理也工作正常。

In [28]: import requests 

In [29]: r = requests.get(url) 

In [30]: r.status_code 
Out[30]: 200

来源

2016-03-07 00:36:36

哇我尝试除了用户代理的每个头。谢谢。愚蠢的错误:( – FancyDolphin

不用担心，这通常会在我要尝试的事情列表中首先出现。 –

HTTP错误403：禁止的urlib2 Python 2.7

回答

相关问题