用美丽的汤摆脱单元格的值在表

与HTML从http://coinmarketcap.com/我想要创建从HTML包含值的Python字典工作行，例如：用美丽的汤摆脱单元格的值在表

{比特币：{Market_cap：” $一百十二亿四千七百四十四万二千七百二十八' ，体积：‘$六千四百六十六万八千九百’}，复仇：....等}

我不熟悉的HTML是如何构成如何过。对于像市值一些事情的细胞（TD）链接到数据，即：

<td class="no-wrap market-cap text-right" data-usd="11247442728.0" data-btc="15963828.0"> 

         $11,247,442,728 

       </td>

但是对于像交易量的细胞，该值是如此格式的链接是不同的，即：

<td class="no-wrap text-right"> 
        <a href="/currencies/bitcoin/#markets" class="volume" data-usd="64668900.0" data-btc="91797.5">$64,668,900</a> 
       </td>

这是我的工作代码：

import requests 
from bs4 import BeautifulSoup as bs 

request = requests.get('http://coinmarketcap.com/') 

content = request.content 

soup = bs(content, 'html.parser') 

table = soup.findChildren('table')[0] 

rows = table.findChildren('tr') 

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     print cell.string

这得到的空白和缺失数据负载的结果。

对于每一行我怎么能得到硬币的名字吗？对于每个单元格，我如何访问每个值？无论它是一个链接（）或常规值

编辑：

通过改变for循环：

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     print cell.getText().strip().replace(" ", "")

我能得到我想要的数据，即：

1 
Bitcoin 
$11,254,003,178 
$704.95 
15,964,212 
BTC 
$63,057,100 
-0.11%

但是，我会很酷，有每个单元格的类名称，即

id: bitcoin 
marketcap: 11,254,003,178 
etc......

来源

2016-11-08 David Hancock

你几乎没有。而不是使用cell.string方法，使用cell.getText()。您可能还需要对输出字符串进行一些清理，以删除多余的空白区域。我用正则表达式，但这里有一些其他的选项，以及取决于您的数据处于什么状态，我已经平添了几分的Python 3兼容性，以及与打印功能。

from __future__ import print_function 
import requests 
import re 

from bs4 import BeautifulSoup as bs 

request = requests.get('http://coinmarketcap.com/') 

content = request.content 

soup = bs(content, 'html.parser') 

table = soup.findChildren('table')[0] 

rows = table.findChildren('tr') 

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     cell_content = cell.getText() 
     clean_content = re.sub('\s+', ' ', cell_content).strip() 
     print(clean_content)

表格标题存储在第一行，这样你就可以像这样提取出来：

headers = [x.getText() for x in rows[0].findChildren('th')]

来源

2016-11-08 05:03:13

伟大啊！它非常完美，非常感谢！任何想法如何获得每个字段的名字吗？ –

增加了关于如何获取每个字段名称（表头）的信息。 –

用美丽的汤摆脱单元格的值在表

回答

相关问题