2016-06-13 35 views
8

我已更新我的问题以提供更清晰的示例。pandas drop_duplicates - TypeError:在*之后键入对象参数必须是一个序列,而不是映射

是否可以使用Pandas中的drop_duplicates方法根据值包含列表的列ID删除重复的行。考虑由列表中的两个项目组成的列“三”。有没有办法删除重复的行,而不是迭代地做(这是我目前的解决方法)。

import pandas as pd 

data = [ 
{'one': 50, 'two': '5:00', 'three': 'february'}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']}, 
{'one': 90, 'two': '9:00', 'three': 'january'} 
] 

df = pd.DataFrame(data) 

print(df) 

    one    three two 
0 50    february 5:00 
1 25 [february, january] 6:00 
2 25 [february, january] 6:00 
3 25 [february, january] 6:00 
4 90    january 9:00 

df.drop_duplicates(['three']) 

导致以下错误:

TypeError: type object argument after * must be a sequence, not map 
+1

'df_two = df_one.drop_duplicates( 'ID')'或'具体= df_two df_one.drop_duplicates(子集= [ 'ID'])' – EdChum

+0

害怕没有解决问题。仍然看到相同的错误 – user3939059

+0

'df_two = df_one.drop_duplicates()'工作吗? – EdChum

回答

15

我认为这是因为列表类型不是可哈希而这搞乱

我已经通过提供下面的例子列出我的问题重复的逻辑。作为一种变通方法,你可以转换为元组,如下所示:

df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x) 
df.drop_duplicates('four') 

    one    three two     four 
0 50    february 5:00    february 
1 25 [february, january] 6:00 (february, january) 
4 90    january 9:00    january 
相关问题