2016-09-06 71 views
0

我想从数据框中提取名词。我做如下如何删除结果中的方括号pos_tag

import pandas as pd 
import nltk 
from nltk.tag import pos_tag 
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']}) 
noun=[] 
for index, row in df.iterrows(): 
    noun.append([word for word,pos in pos_tag(row) if pos == 'NN']) 
df['noun'] = noun 

,我也得到DF [ '名词']

0  [noun] 
1 [Alice] 
2   [] 
3   [] 
4  [city] 

我用正则表达式

df['noun'].replace('[^a-zA-Z0-9]', '', regex = True) 

,并再次

0  [noun] 
1 [Alice] 
2   [] 
3   [] 
4  [city] 
Name: noun, dtype: object 

有什么不对?

回答

2

括号表示您在数据框的每个单元格中都有列表。如果你相信有只有一个元素是最多每个列表中,您可以在名词列中使用str,并提取第一个元素:

df['noun'] = df.noun.str[0] 

df 
# pos noun 
#0 noun noun 
#1 Alice Alice 
#2 good NaN 
#3 well NaN 
#4 city city 
+0

如果什么有多个元素? – Enthusiast