熊猫：如何获取包含值列表的列的唯一值？

考虑下面的数据帧熊猫：如何获取包含值列表的列的唯一值？

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']], 
        'col' : ['A','B','A','B','A','B']})  
df.sort_values(by='col',inplace=True) 

df 
Out[62]: 
    col     name 
0 A [one two, three four] 
2 A      [] 
4 A    [one two] 
1 B     [one] 
3 B      [] 
5 B    [three]

我想获得一个跟踪列入name为col每个组合的所有唯一字符串的列。

也就是说，预期产量

df 
Out[62]: 
    col     name unique_list 
0 A [one two, three four] [one two, three four] 
2 A      [] [one two, three four] 
4 A    [one two] [one two, three four] 
1 B     [one] [one, three] 
3 B      [] [one, three] 
5 B    [three] [one, three]

事实上，说为一组，你可以看到，唯一的一组字符串包含在[one two, three four]，[]和[one two]是[one two]

我能获得相应使用的唯一值数量Pandas : how to get the unique number of values in cells when cells contain lists?：

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique()))) 


df 
Out[65]: 
    col     name count_unique 
0 A [one two, three four]   2 
2 A      []   2 
4 A    [one two]   2 
1 B     [one]   2 
3 B      []   2 
5 B    [three]   2

，但替换nunique与unique以上失败。

任何想法？谢谢！

来源

2016-09-14 ℕʘʘḆḽḘ

下面是解

df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique)) 
    df

来源

2016-09-14 22:47:11 piRSquared

有趣。 '总和'字符串？！ –

@Noobie它比这更糟糕。它是名单上的太阳。它生成一个连接列表，我在这个连接列表中应用nhe.nif.unique – piRSquared

hehehe。我只是尝试，但似乎你有很好的解决方案失败，当有遗漏值col。在这种情况下，我得到'TypeError：只能连接列表（而不是“int”）到列表。用'fillna（''）'或'fillna（'[]'）替换缺失的值不起作用。有任何想法吗？ –

尝试：

uniq_df = df.groupby('col')['name'].apply(lambda x: list(set(reduce(lambda y,z: y+z,x)))).reset_index() 
uniq_df.columns = ['col','uniq_list'] 
pd.merge(df,uniq_df, on='col', how='left')

所需的输出：

col     name    uniq_list 
0 A [one two, three four] [one two, three four] 
1 A      [] [one two, three four] 
2 A    [one two] [one two, three four] 
3 B     [one]   [three, one] 
4 B      []   [three, one] 
5 B    [three]   [three, one]

来源

2016-09-14 22:08:02 Abdou

感谢@abdou！让我试试 –

熊猫：如何获取包含值列表的列的唯一值？

回答

相关问题