2017-04-23 59 views
1

我想用两个新的索引将两个熊猫数据帧合并为一个新的第三个数据帧。假设我起始于以下:熊猫:连接并重新索引数据帧

df = pd.DataFrame(np.ones(25).reshape((5,5)),index = ['A','B','C','D','E']) 
df1 = pd.DataFrame(np.ones(25).reshape((5,5))*2,index = ['A','B','C','D','E']) 
df[2] = np.nan 
df1[3] = np.nan 
df[4] = np.nan 
df1[4] = np.nan 

我想至少费解的方式来实现以下结果:

NewIndex OldIndex df df1 
1 A 1 2 
2 B 1 2 
3 C 1 2 
4 D 1 2 
5 E 1 2 
6 A 1 2 
7 B 1 2 
8 C 1 2 
9 D 1 2 
10 E 1 2 
11 A NaN 2 
12 B NaN 2 
13 C NaN 2 
14 D NaN 2 
15 E NaN 2 
16 A 1 NaN 
17 B 1 NaN 
18 C 1 NaN 
19 D 1 NaN 
20 E 1 NaN 

什么是做到这一点的最好方法是什么?

回答

1

您必须拆除数据框,然后重新链接串联的数据框。

import numpy as np 
import pandas as pd 
# test data 
df = pd.DataFrame(np.ones(25).reshape((5,5)),index = ['A','B','C','D','E']) 
df1 = pd.DataFrame(np.ones(25).reshape((5,5))*2,index = ['A','B','C','D','E']) 
df[2] = np.nan 
df1[3] = np.nan 
df[4] = np.nan 
df1[4] = np.nan 

# unstack tables and concat 
newdf = pd.concat([df.unstack(),df1.unstack()], axis=1) 
# reset multiindex for level 1 
newdf.reset_index(1, inplace=True) 
# rename columns 
newdf.columns = ['OldIndex','df','df1'] 
# drop old index 
newdf = newdf.reset_index().drop('index',1) 
# set index from 1 
newdf.index = np.arange(1, len(newdf) + 1) 
# rename new index 
newdf.index.name='NewIndex' 
print(newdf) 

输出:

  OldIndex df df1 
NewIndex     
1    A 1.0 2.0 
2    B 1.0 2.0 
3    C 1.0 2.0 
4    D 1.0 2.0 
5    E 1.0 2.0 
6    A 1.0 2.0 
7    B 1.0 2.0 
8    C 1.0 2.0 
9    D 1.0 2.0 
10    E 1.0 2.0 
11    A NaN 2.0 
12    B NaN 2.0 
13    C NaN 2.0 
14    D NaN 2.0 
15    E NaN 2.0 
16    A 1.0 NaN 
17    B 1.0 NaN 
18    C 1.0 NaN 
19    D 1.0 NaN 
20    E 1.0 NaN 
21    A NaN NaN 
22    B NaN NaN 
23    C NaN NaN 
24    D NaN NaN 
25    E NaN NaN 
+1

是的,这个答案是__much__更好! – MaxU

+1

谢谢你的评论。 – Serenity