2015-09-15 111 views
2

我有一个熊猫DataFrame看起来像这样(目前没有一个索引以外的内置行索引,但如果它更容易添加索引到“人”和“汽车”,这也没关系) :Flatten一个熊猫DataFrame

before = pd.DataFrame({ 
    'Email': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'], 
    'Person': ['John','Mary','Jane','John','Mary'], 
    'Car': ['Ford','Toyota','Nissan','Nissan','Ford'] 
}) 

我想重新塑造它看起来像这样:

after = pd.DataFrame({ 
    'Person': ['John','Mary','Jane'], 
    'Email': ['[email protected]','[email protected]','[email protected]'], 
    'Ford': [True,True,False], 
    'Nissan': [True,False,True], 
    'Toyota': [False,True,False] 
}) 

注意,约翰已经拥有两个福特和日产,玛丽已拥有福特和丰田,保罗一直坚守着他可靠的日产。

我已经尝试了堆叠多索引DataFrame,分组,pivoting的各种排列 - 我似乎无法弄清楚如何从“Car”列中取值并将其转置到新列价值“真实”,通过他们的名字合并人们。

回答

1

不知道这是要做到这一点的最佳方式,但一个方法是 -

In [26]: before.pivot_table(index=['Email','Person'],columns=['Car'], aggfunc=lambda x: True).fillna(False).reset_index() 
Out[26]: 
Car    Email Person Ford Nissan Toyota 
0 [email protected] Jane False True False 
1 [email protected] John True True False 
2 [email protected] Mary True False True 
+0

接受,因为我迷恋优雅的单行本,以及缺乏一次性柱。感谢您及时的回复。 :) – Dustin

1
before['has_car'] = True 

Out[93]: 
car    Email Person has_car 
Ford [email protected] John True 
Toyota [email protected] Mary True 
Nissan [email protected] Jane True 
Nissan [email protected] John True 
Ford [email protected] Mary True 

df = before.pivot_table(index = ['Person' , 'Email'], columns= 'Car' , values='has_car') 


Out[89]: 
          Ford Nissan Toyota 
Person Email   
Jane [email protected] NaN  True NaN 
John [email protected] True True NaN 
Mary [email protected] True NaN  True 

df.fillna(False).reset_index() 

Out[102]: 
Car Person Email    Ford Nissan Toyota 
0 Jane [email protected] False True False 
1 John [email protected] True True False 
2 Mary [email protected] True False True 
+0

一步一步这对理解有帮助,谢谢! – Dustin