2
我有一个简单的数据帧如下:从熊猫列创建angrams名单
df = pd.DataFrame({
'notes': pd.Series(['meth cook makes meth with purity of over 96%', 'meth cook is also called Heisenberg', 'meth cook has cancer', 'he is known as the best meth cook', 'Meth Dealer added chili powder to his batch', 'Meth Dealer learned to make the best meth', 'everyone goes to this Meth Dealer for best shot', 'girlfriend of the meth dealer died', 'this lawyer is a people pleasing person', 'cinnabon has now hired the lawyer as a baker', 'lawyer had to take off in the end', 'lawyer has a lot of connections who knows other guy']),
'name': pd.Series([np.nan, 'Walter White', np.nan, np.nan, np.nan, np.nan, 'Jessie Pinkman', np.nan, 'Saul Goodman', np.nan, np.nan, np.nan]),
'occupation': pd.Series(['meth cook', np.nan, np.nan, np.nan, np.nan, np.nan, 'meth dealer', np.nan, np.nan, 'lawyer', np.nan, np.nan])
})
它看起来如下:
name notes occupation
NaN meth cook makes meth with purity of over 96% meth cook
Walter White meth cook is also called Heisenberg NaN
NaN meth cook has cancer NaN
NaN he is known as the best meth cook NaN
NaN Meth Dealer added chili powder to his batch NaN
NaN Meth Dealer learned to make the best meth NaN
Jessie Pinkman everyone goes to this Meth Dealer for best shot meth dealer
NaN girlfriend of the meth dealer died NaN
Saul Goodman this lawyer is a people pleasing person NaN
NaN cinnabon has now hired the lawyer as a baker lawyer
NaN lawyer had to take off in the end NaN
NaN lawyer has a lot of connections who knows other guy NaN
我想创建字/字谜列表'笔记'专栏。我还想排除“笔记”列中的任何数字/特殊字符(例如:我不想在输出中使用96%)。
我还想将所有单个单词(没有重复)写入文本文件。
我该如何在Python中做到这一点?
如果我的回答对您有所帮助,不要忘了[接受](http://meta.stackexchange.com/questions/5234/how-do-accepting-an-answer-work)并且赞成。谢谢。 – jezrael
谢谢!我现在将这个应用于我更大的数据框。这个解决方案很有意义 –
非常感谢。 – jezrael