在my_list项目的人行你可先用str.split
与stack
的话Series
:
a = df['Story'].str.split(expand=True).stack()
print (a)
0 0 Kumar
1 is
2 one
3 of
4 the
5 great
6 player
7 in
8 his
9 team
1 0 Ravi
1 is
2 a
3 good
4 poet
2 0 Ram
1 drives
2 well
dtype: object
然后过滤通过boolean indexing
与isin
,得到value_counts
并为DataFrame添加rename_axis
和reset_index
:
df = a[a.isin(my_list)].value_counts().rename_axis('word').reset_index(name='count')
print (df)
word count
0 is 2
1 one 1
与str.split
创建的所有单词列表另一种解决方案,然后通过from_iterable
fllaten,使用Counter
并持续通过构造函数创建DataFrame
:
from collections import Counter
from itertools import chain
my_list=["one","is"]
a = list(chain.from_iterable(df['Story'].str.split().values.tolist()))
print (a)
['Kumar', 'is', 'one', 'of', 'the', 'great', 'player',
'in', 'his', 'team', 'Ravi', 'is', 'a', 'good', 'poet', 'Ram', 'drives', 'well']
b = Counter([x for x in a if x in my_list])
print (b)
Counter({'is': 2, 'one': 1})
df = pd.DataFrame({'word':list(b.keys()),'count':list(b.values())}, columns=['word','count'])
print (df)
word count
0 one 1
1 is 2
谢谢@jezrael,我会努力这并更新你 – pyd
你可以检查这一个https://stackoverflow.com/questions/45048818/keyword-search-between-two-dataframes-using-python-pandas – pyd