2017-09-30 233 views
0

我有一个熊猫系列在他们。我话的集合列出一串的数量出现想找到一个特定的词的频率在每个列表对于例如, 该系列产品是使用在熊猫系列

0 [All, of, my, kids, have, cried, nonstop, when... 
1 [We, wanted, to, get, something, to, keep, tra... 
2 [My, daughter, had, her, 1st, baby, over, a, y... 
3 [One, of, babys, first, and, favorite, books, ... 
4 [Very, cute, interactive, book, My, son, loves... 

我想要得到每行孩子的数量。我曾尝试

series.count('kids') 

这给了我一个错误说“级别的孩子必须是相同的名称(无)”

series.str.count('kids) 

给我NaN值。

我应该如何去获取计数?

+0

如果你的问题得到回答,请[接受一个(帮助最多](ht TPS://stackoverflow.com/help/someone-answers)。 –

回答

2

使用

In [5288]: series.apply(lambda x: x.count('kids')) 
Out[5288]: 
0 1 
1 0 
2 0 
3 0 
4 0 
Name: s, dtype: int64 

详细

In [5292]: series 
Out[5292]: 
0 [All, of, my, kids, have, cried, nonstop, when] 
1 [We, wanted, to, get, something, to, keep, tra] 
2 [My, daughter, had, her, 1st, baby, over, a, y] 
3  [One, of, babys, first, and, favorite, books] 
4 [Very, cute, interactive, book, My, son, loves] 
Name: s, dtype: object 

In [5293]: type(series) 
Out[5293]: pandas.core.series.Series 

In [5294]: type(series[0]) 
Out[5294]: list 
+0

加上1以将该文本转换为列表。 :) – Dark

+0

我对python有点新,所以请耐心等待。我使用split()将文本转换为列表。我尝试使用lambda之前,但我得到了这个错误 –

+0

在做拆分之前,你可以实际使用'series.str.count('kids')' – Zero

1

在原系列,使用str.findall + str.len

print(series) 

0  All of my kids have cried nonstop when 
1  We wanted to get something to keep tra 
2  My daughter had her 1st baby over a y 
3  One of babys first and favorite books 
4 Very cute interactive book My son loves 

print(series.str.findall(r'\bkids\b')) 

0 [kids] 
1  [] 
2  [] 
3  [] 
4  [] 
dtype: object 

counts = series.str.findall(r'\bkids\b').str.len() 
print(counts) 

0 1 
1 0 
2 0 
3 0 
4 0 
dtype: int64