2017-03-09 49 views
-1

我想在Python用BeautifulSoup报废https://www.crowdcube.com/investments?sector=technology 3.我不能用刮美丽的汤网页

Traceback (most recent call last): 

     File "D:\DataVisualization\lib\urllib\request.py", line 163, in urlopen 
     return opener.open(url, data, timeout) 
     File "D:\DataVisualization\lib\urllib\request.py", line 472, in open 
     response = meth(req, response) 
     File "D:\DataVisualization\lib\urllib\request.py", line 582, in http_response 
     'http', request, response, code, msg, hdrs) 
     File "D:\DataVisualization\lib\urllib\request.py", line 510, in error 
     return self._call_chain(*args) 
     File "D:\DataVisualization\lib\urllib\request.py", line 444, in _call_chain 
     result = func(*args) 
     File "D:\DataVisualization\lib\urllib\request.py", line 590, in http_error_default 
     raise HTTPError(req.full_url, code, msg, hdrs, fp) 
    urllib.error.HTTPError: HTTP Error 403: Forbidden 
+1

你能发布你正在使用的美丽汤代码? – bejado

+0

从BS4进口BeautifulSoup 进口的urllib,重新 数据= { '标题':[], '描述':[] } 升=( 'https://www.crowdcube.com/investment' ) 树= BeautifulSoup(1, 'LXML') #title 标题= tree.find_all( 'DIV',{ 'CC-cardOpportunity__body'}) 数据[ '标题'] = tree.find( 'H1' ) #description description = tree.find_all('div',{'class':'cc-cardOpportunity__body'}) data ['description']。append(description [1] .find(' p')。get_text() data – Mart

+0

我不能scrapy这个网站:( – Mart

回答

-1

使用请求与本网站不需要UA:

In [23]: import requests 

In [24]: r = requests.get('https://www.crowdcube.com/investments?sector=technology') 

In [25]: r.status_code 
Out[25]: 200 
+0

OP特意要求美丽的汤。 – bejado

+0

@bejado你有什么想法'bs4'和'urllib'或'requests'之间的区别吗? '403'如何关注'bs4'? –

+0

我不确定OP为什么会得到403,但问题是特别要求_why_ 403是使用美丽的汤时发出的。你的回答没有解决这个问题。 – bejado