美丽的汤返回空集

美丽的汤在本地机器上工作正常，但不能在另一台服务器上工作。美丽的汤返回空集

import urllib2 
import bs4 

url = urllib2.urlopen("http://www.google.com") 
html = url.read() 
soup = bs4.BeautifulSoup(html) 

print soup

打印Html输出正确的谷歌网页。打印汤回复空。

在本地工作正常，但是在这个红帽机器上它返回空。

这是否与安装解析器有关？我查了一些其他可能的解决方案，他们提到安装解析器，但迄今为止没有运气。

Beautiful Soup returning nothing并不适用于我的问题

来源

2013-11-28 Darthyogurt

好，什么是你的本地计算机和服务器之间的区别是什么？ – bchhun

据我了解你的问题，在这两台机器的HTML读取OK，但在本地机器上，你从bs4得到一些输出，而在服务器上你什么也得不到。你会得到'None'还是空字符串？ –

本地正在运行windows，python 2.7。运行redhat的服务器python 2.7 – Darthyogurt

这种解决方案只是为了证明你，你的情况是独一无二的，没有什么做与红帽。

我从AWS中踢出了一个微型Redhat实例，这里是从SSH到全新的Redhat机器的完整过程。 enter image description here

（1）在这里，我安装了新的机器上beautifulsoup4：

$ ssh -i key.pem [email protected] 
The authenticity of host 'awsip' cant be established. 
RSA key fingerprint is .... 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'awsip' (RSA) to the list of known hosts. 
[[email protected] ~]$ sudo easy_install beautifulsoup4 
Searching for beautifulsoup4 
Reading http://pypi.python.org/simple/beautifulsoup4/ 
... 
Installed /usr/lib/python2.6/site-packages/beautifulsoup4-4.3.2-py2.6.egg 
Processing dependencies for beautifulsoup4 
Finished processing dependencies for beautifulsoup4

（2）我打开了Python和从谷歌获得的输出都在html和soup

[[email protected] ~]$ python 
Python 2.6.6 (r266:84292, May 27 2013, 05:35:12) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import urllib2 
>>> from bs4 import BeautifulSoup 
>>> html = urllib2.urlopen("http://www.google.com").read() 
>>> soup = BeautifulSoup(html) 
>>> print html[:100] 
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="Search t 
>>> print soup.prettify()[:100] 
<!DOCTYPE html> 
<html itemscope="" itemtype="http://schema.org/WebPage"> 
<head> 
    <meta content="Se

调试它是urllib2或bs4的错误：尝试运行此代码：

from bs4 import BeautifulSoup 

html = """ 
<html> 
<head> 
</head> 
<body> 
<div id="1">numberone</div> 
<div id="2">numbertwo</div> 
</body> 
</html> 
""" 

print BeautifulSoup(html).find('div', {"id":"1"})

如果beautifulsoup成功安装后，你会得到这样的低于预期的输出：

<div id="1">numberone</div>

来源

2013-11-28 20:01:37

谢谢！我也在使用aws。这是一个相当混乱的设置，因为它在工作中被其他队友使用。我想我的一个问题是如何找到为什么美丽的汤不起作用的问题 – Darthyogurt

@Darthyogurt在这整个过程中有两个主要步骤。首先使用urllib2下载html，然后使用beautifulsoup构建树。也许你可以对一个简单的html文件进行硬编码，看看beautifulsoup是否会构建树。查看更新后的帖子。 –

所以我复制了示例代码并运行脚本返回None。我假设这是一个BS4问题。这很奇怪，因为它不会抛出任何错误。 – Darthyogurt

美丽的汤返回空集

回答

相关问题