将行，列值转换为字典并转换为数据框熊猫

我有一个数据框people与name和text作为两列。

name  text 
0 Obama  Obama was the 44th president of the... 
1 Trump  Donald J. Trump ran as a republican...

我只需要对Obama进行一些探索性分析。

obama= people[people['name'] == 'Obama'].copy() 
obama.text 

35817 Obama was the 44th president of the unit... 
Name: text, dtype: object

如何将文本转换为字典与键的话和单词值的计数的新列？
例如：

name  text         dictionary 
0 Obama  Obama was the 44th president of the... {'Obama':1, 'the':2,...}

做一次，我怎么字典转换为一个单独的数据帧？
预期：

word count 
0 Obama 1 
1 the 2

来源

2016-11-18 Drj

可以使用Counter对象从集合模块：

import collections 

people['dictionary'] = people.text.apply(lambda x: dict(collections.Counter(x.split())))

要转换这些字典的数据帧中的一个：

dictionary = people['dictionary'][0] 
pd.DataFrame(data={'word': dictionary.keys(), 'count': dictionary.values()})

来源

2016-11-18 03:02:10 nathanielobrown

第一部分奇迹般有效。第二个将字典转换为数据框继续给我问题''numpy.ndarray'对象不可调用'。我终于解决了它使用'pd.DataFrame.from_dict（dictionary，orient =“index”）' – Drj

嗯，有趣。我期望'people'''dictionary'] [0]'产生一本字典，但听起来你正在获得一个熊猫系列。也许你正在使用不同版本的熊猫。您可以尝试使用'Dataframe.loc'或'Dataframe.iloc'作为引用[here]（http://pandas.pydata.org/pandas-docs/stable/indexing.html）。 – nathanielobrown

是的，你是对的，这确实是一个系列，我很困惑从'R'移动。我认为这是熊猫的工作方式，但看起来像一些特定的版本。无论如何，现在这个问题已经解决了，我会尽快尝试你的建议。 – Drj

将行，列值转换为字典并转换为数据框熊猫

回答

相关问题