2015-10-04 133 views
1

我有一个有两列的熊猫数据框。例如:熊猫数据框,唯一化列

index  result 
LI00066994 0.740688 
LI00066994 0.742431 
LI00066994 0.741826 
LI00066994 0.741328 
LI00066994 0.741826 
LI00066994 0.741328 
LI00073078 0.741121 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 

现在我想有一个数据帧,在我的索引是唯一的,同时保持所有相应的结果 - 他们应该是在不同的列(结果1,结果2,result3 ...)。

所需的输出:

index  result1 result2 result3 result4 result5 result6 
LI00066994 0.740688 0.742431 0.741826 0.741328 0.741826 0.741328 
LI00073078 0.741121 0.752619 0.757116 0.752619 0.757116 0.752619 

任何一个知道如何做到这一点?

回答

1

你可以做这样的事情:

d = """index  result 
LI00066994 0.740688 
LI00066994 0.742431 
LI00066994 0.741826 
LI00066994 0.741328 
LI00066994 0.741826 
LI00066994 0.741328 
LI00073078 0.741121 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.752620""" 

df = pd.read_csv(pd.core.common.StringIO(d), sep='\s+') 

df_out = pd.concat([pd.DataFrame({name: df_['result'].values}).T for name, df_ in df.groupby('index')]) 
df_out = df_out.rename(columns=lambda x: 'result' + str(x)) 
df_out = df_out.reset_index() 
print df_out 

产量:

 index result0 result1 result2 result3 result4 result5 result6 
0 LI00066994 0.741 0.742 0.742 0.741 0.742 0.741  NaN 
1 LI00073078 0.741 0.753 0.757 0.753 0.757 0.753 0.753 
0

不知道如何用熊猫做到这一点。但是,如果你很高兴扔numpy的混进去给这一个镜头:

import numpy as np 
import pandas as pd 

index = [ 
    'LI00066994', 'LI00066994', 'LI00066994', 
    'LI00066994', 'LI00066994', 'LI00066994', 
    'LI00073078', 'LI00073078', 'LI00073078', 
    'LI00073078', 'LI00073078', 'LI00073078'] 
data = [ 
    0.740688, 0.742431, 0.741826, 0.741328, 
    0.741826, 0.741328, 0.741121, 0.752619, 
    0.757116, 0.752619, 0.757116, 0.752619] 
columns=['result'] 
df = pd.DataFrame(data=data, index=index, columns=columns) 

unique_index = np.unique(df.index) 
new_data = np.vstack([df.T[lookup] for lookup in unique_index]) 

new_df = pd.DataFrame(data=new_data, index=unique_index)