2016-10-11 30 views
0

您能否建议修复?它几乎从一张图像下载imgur页面中的所有图像,不知道为什么它在这种情况下不起作用以及如何修复它?bs4.FeatureNotFound:找不到具有您请求的功能的树型构建器:lxml

elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') 
         or submission.url.endswith('webm') 
         or submission.url.endswith('mp4') 
         or 'all' in submission.url 
         or '#' in submission.url 
         or '/a/' in submission.url): 
       html_source = requests.get(submission.url).text # download the image's page 
       soup = BeautifulSoup(html_source, "lxml") 
       image_url = soup.select('img')[0]['src'] 
       if image_url.startswith('//'): 
       image_url = 'http:' + image_url 
       image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')] 
       try: 
       image_file = urllib2.urlopen(image_url, timeout = 5) 
       with open('/home/mona/computer_vision/image_retrieval/images/'+ category+ '/'+ 'imgur_'+ datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-9:], 'wb') as output_image: 
         output_image.write(image_file.read()) 
         except urllib2.URLError as e: 
         print(e) 
         continue 

的错误是:

[LOG] Done Getting http://i.imgur.com/FoCjtI7.jpg 
submission id is: 1alffm 
[LOG] Getting url: http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg 
HTTP Error 403: Forbidden 
[LOG] Getting url: http://imgur.com/xp386 
Traceback (most recent call last): 
    File "download_images.py", line 67, in <module> 
    soup = BeautifulSoup(html_source, "lxml") 
    File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 155, in __init__ 
    % ",".join(features)) 
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? 
+0

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser – Muposat

回答

1

打开蟒蛇外壳和尝试以下操作:

from bs4 import BeautifulSoup 
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>" 
soup = BeautifulSoup(myHTML, "lxml") 

这是否工作,还是同样的错误?如果同样的错误,你错过了lxml。安装:

pip install lxml 

我经历的步骤,因为你表明该脚本工作了好一会儿才崩溃,在这种情况下,你不能缺少的解析器?

由OP补充:

If you are using Python2.7 in Ubuntu/Debian, this worked for me: 

$ sudo apt-get build-dep python-lxml 
$ sudo pip install lxml 

Test it like: 

[email protected]:~/computer_vision/image_retrieval$ python 
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import lxml 
+0

感谢。脚本在另一台机器上工作。我错过了在这台新机器上安装lxml。 –

相关问题