使用BeautifulSoup提取表格信息（bs4）

任何人都可以给我一段BeautifulSoup代码来提取表中找到的一些项目here？使用BeautifulSoup提取表格信息（bs4）

这里是我的尝试：

from bs4 import BeautifulSoup 
from urllib2 import urlopen 

url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A" 

html = urlopen(url).read() 
soup = BeautifulSoup(html,"lxml") 
tables = soup.findAll("table")

然而，这是失败的 - 表原来是空的。

对不起，我是BeautifulSoup noob。

谢谢！

来源

2013-07-26 littleO

给定的url页面不包含源代码中的任何表格元素。

表格是~~由iframe内的iframe~~生成。

import urllib 
from bs4 import BeautifulSoup 

url = 'http://biology.burke.washington.edu/conus/recordview/description.php?ID=1l9l0l421l55llll&tabs=21100111&frms=1&pglimit=A&offset=&res=&srt=&sql2=' 

html = urllib.urlopen(url).read() 
soup = BeautifulSoup(html) 
tables = soup.find_all('table') 
#print(tables)

硒的解决方案：

from selenium import webdriver 
from bs4 import BeautifulSoup 

url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A" 

driver = webdriver.Firefox() 
driver.get(url) 
driver.switch_to_frame(driver.find_elements_by_tag_name('iframe')[0]) 
soup = BeautifulSoup(driver.page_source) 
tables = soup.find_all('table') 
#print(tables) 
driver.quit()

来源

2013-07-26 07:44:00 falsetru

好的谢谢！我没有意识到这一点。您是否看到顶部的壳形态测量框中的信息？我将如何提取该框内的信息？ – littleO

嗯，我会考虑硒。有没有简单的方法使用BeautifulSoup来做到这一点？ – littleO

@littleO，我添加了一个使用硒+ bs4的代码。 – falsetru

这是我目前的工作流程：

from bs4 import beautifulsoup 
from urllib2 import urlopen 
url = "http://somewebpage.com" 
html = urlopen(url).read() 
soup = BeautifulSoup(html) 
tables = soup.find_all('table')

来源

2013-11-26 05:46:12 0077cc

使用BeautifulSoup提取表格信息（bs4）

回答

相关问题