的urllib重定向错误

我试图用刮urllib而BeautifulSoup表，我得到的错误：的urllib重定向错误

“urllib.error.HTTPError：HTTP错误302：HTTP服务器返回一个重定向错误，这将导致一个无限循环的最后一个30X的错误消息是：找到”

我听说这是关系到需要的cookies的网站，但我的第2次尝试后，我仍然得到这个错误：

import urllib.request 
from bs4 import BeautifulSoup 
import re 

opener = urllib.request.build_opener() 
opener.addheaders = [('User-agent', 'Mozilla/5.0')] 
file = opener.open(testURL).read().decode() 
soup = BeautifulSoup(file) 
tables = soup.find_all('tr',{'style': re.compile("color:#4A3C8C")}) 
print(tables)

来源

2017-08-22 Connor McLaughlin

我认为你需要提供'testURL'为了让人们了解发生了什么事情，什么是该特定网站 –

我收集捐助者为特定公共数据的要求www.politicalmoneyline.com上的候选人，testURL的示例将是http://www.politicalmoneyline.com/tr/tr_mg_cand.aspx?&sCycle=2018&sCandID=H8WI01024&td= –

一意见建议：

如果您必须处理cookie，请使用HTTPCookieProcessor。
您不必使用自定义用户代理，但如果您想模拟Mozilla，则必须使用它的全名。本网站不接受'Mozilla/5.0'并将继续重定向。
你可以用HTTPError来发现这种例外。

opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor()) 
user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:54.0) Gecko/20100101 Firefox/54.0' 
opener.addheaders = [('user-agent', user_agent)] 

try: 
    response = opener.open(testURL) 
except urllib.error.HTTPError as e: 
    print(e) 
except Exception as e: 
    print(e) 
else: 
    file = response.read().decode() 
    soup = BeautifulSoup(file, 'html.parser') 
    ... etc ...

来源

2017-08-22 20:08:57

的urllib重定向错误

回答

相关问题