2017-08-16 125 views
2

我创建一个数据帧变换由熊猫

import pandas as pd 

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,   
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",  
"Portland"] })   

df1.groupby(["City"])['Name'].transform(lambda x:  
','.join(x)).drop_duplicates()  
I want the output as  

Name City     
Alice,Bob,Mallory,Bob  Seattle   
Mallory,Mallory Portland   

but i am getting only   
Name   
Alice,Bob,Mallory,Bob     
Mallory,Mallory   

This is an example with small number of columns but in my actual problem i 
have too many columns so i cannot use   
df1['Name']= df1.groupby(['City'])['Name'].transform(lambda x:   
','.join(x))    
df1.groupby(['City','Name'], as_index=False)    
df1.drop_duplicates()   

,因为每个专栏中,我不得不写相同的代码
有没有办法做到这一点,而无需编写变换为每列 独立。

回答

2

1列聚集

我认为你需要apply,.join,则变更单使用双[[]]

df = df1.groupby(["City"])['Name'].apply(','.join).reset_index() 
df = df[['Name','City']] 
print (df) 
        Name  City 
0  Mallory,Mallory Portland 
1 Alice,Bob,Mallory,Bob Seattle 

因为transform创建汇总值新列:

df1['new'] = df1.groupby("City")['Name'].transform(','.join) 
print (df1) 
     City  Name     new 
0 Seattle Alice Alice,Bob,Mallory,Bob 
1 Seattle  Bob Alice,Bob,Mallory,Bob 
2 Portland Mallory  Mallory,Mallory 
3 Seattle Mallory Alice,Bob,Mallory,Bob 
4 Seattle  Bob Alice,Bob,Mallory,Bob 
5 Portland Mallory  Mallory,Mallory 

2列和多个聚合

如果多个列需要agg与指定列[]或没有指定为参加所有的字符串列:

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
"Name2": ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],  
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",  
"Portland"] }) 
print (df1) 
     City  Name  Name2 
0 Seattle Alice Alice1 
1 Seattle  Bob  Bob1 
2 Portland Mallory Mallory1 
3 Seattle Mallory Mallory1 
4 Seattle  Bob  Bob1 
5 Portland Mallory Mallory1 

df = df = df1.groupby('City')['Name', 'Name2'].agg(','.join).reset_index() 
print (df) 
     City     Name      Name2 
0 Portland  Mallory,Mallory   Mallory1,Mallory1 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1 

ANF如果需要汇总所有列:

df = df1.groupby('City').agg(','.join).reset_index() 
print (df) 
     City     Name      Name2 
0 Portland  Mallory,Mallory   Mallory1,Mallory1 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1 

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
"Name2": ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],  
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"], 
'Numbers':[1,5,4,3,2,1]}) 
print (df1) 
     City  Name  Name2 Numbers 
0 Seattle Alice Alice1  1 
1 Seattle  Bob  Bob1  5 
2 Portland Mallory Mallory1  4 
3 Seattle Mallory Mallory1  3 
4 Seattle  Bob  Bob1  2 
5 Portland Mallory Mallory1  1 


df = df1.groupby('City').agg({'Name': ','.join, 
           'Name2': ','.join, 
           'Numbers': 'max'}).reset_index() 
print (df) 
     City     Name      Name2 Numbers 
0 Portland  Mallory,Mallory   Mallory1,Mallory1  4 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1  5 
+0

好感谢这是工作,还有一件事想我有数字多了一个栏,我不得不计算最大或最小与该列的上述操作,那么我将如何在一个语句中添加两个agg函数。 – vatsal

+0

查看编辑答案。 – jezrael

+1

非常感谢你:) – vatsal

1

你凑LD做

In [42]: df1.groupby('City')['Name'].agg(','.join).reset_index(name='Name') 
Out[42]: 
     City     Name 
0 Portland  Mallory,Mallory 
1 Seattle Alice,Bob,Mallory,Bob 

或者,

In [49]: df1.groupby('City', as_index=False).agg({'Name': ','.join}) 
Out[49]: 
     City     Name 
0 Portland  Mallory,Mallory 
1 Seattle Alice,Bob,Mallory,Bob 

对于多个聚合

df1.groupby('City', as_index=False).agg(
     {'Name': ','.join, 'Name2': ','.join, 'Number': 'max'}) 
+0

如果我有更多的列作为Name2,那么我将如何使用上述函数来获得与字符串聚合相同的结果。 – vatsal

+0

检查我的答案。 – jezrael