2017-07-25 18 views
1
my_list=["one","is"] 

df 
Out[6]: 
     Name Story 
    0 Kumar Kumar is one of the great player in his team 
    1 Ravi Ravi is a good poet 
    2 Ram  Ram drives well 

如果my_list项目中的任何一个出现在“故事”专栏中,我需要得到不发生对所有的项目。如何获得关键字列表的occurence的数量上的DataColumn的数据帧在python

my_desired_output 

new_df 
word  count 
one  1 
is  2 

我实现了提取,它们是具有使用

mask=df1["Story"].str.contains('|'.join(my_list),na=False) but now I am trying get the counts of each word in my_list 

回答

1

在my_list项目的人行你可先用str.splitstack的话Series

a = df['Story'].str.split(expand=True).stack() 
print (a) 
0 0  Kumar 
    1  is 
    2  one 
    3  of 
    4  the 
    5  great 
    6 player 
    7  in 
    8  his 
    9  team 
1 0  Ravi 
    1  is 
    2   a 
    3  good 
    4  poet 
2 0  Ram 
    1 drives 
    2  well 
dtype: object 

然后过滤通过boolean indexingisin,得到value_counts并为DataFrame添加rename_axisreset_index

df = a[a.isin(my_list)].value_counts().rename_axis('word').reset_index(name='count') 
print (df) 
    word count 
0 is  2 
1 one  1 

str.split创建的所有单词列表另一种解决方案,然后通过from_iterable fllaten,使用Counter并持续通过构造函数创建DataFrame

from collections import Counter 
from itertools import chain 

my_list=["one","is"] 

a = list(chain.from_iterable(df['Story'].str.split().values.tolist())) 
print (a) 
['Kumar', 'is', 'one', 'of', 'the', 'great', 'player', 
'in', 'his', 'team', 'Ravi', 'is', 'a', 'good', 'poet', 'Ram', 'drives', 'well'] 

b = Counter([x for x in a if x in my_list]) 
print (b) 
Counter({'is': 2, 'one': 1}) 

df = pd.DataFrame({'word':list(b.keys()),'count':list(b.values())}, columns=['word','count']) 
print (df) 
    word count 
0 one  1 
1 is  2 
+1

谢谢@jezrael,我会努力这并更新你 – pyd

+0

你可以检查这一个https://stackoverflow.com/questions/45048818/keyword-search-between-two-dataframes-using-python-pandas – pyd

相关问题