2017-02-10 151 views
2

我想合并两个MultiIndex'ed数据帧。我的代码如下。正如你在输出中看到的那样,问题是重复了“DATE”索引,而我希望所有的值(OPEN_INT,PX_LAST)都在同一个日期索引中......任何想法?我试过追加和concat,但都给了我类似的结果。Python熊猫 - 问题追加/ concat两个多索引数据帧

  if df.empty: 
       df = bbg_historicaldata(t, f, startDate, endDate) 
       datesArray = list(df.index) 
       tArray = [t for i in range(len(datesArray))] 
       arrays = [tArray, datesArray] 
       tuples = list(zip(*arrays)) 
       index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])      
       df = pd.DataFrame({f : df[f].values}, index=index) 

      else: 
       temp = bbg_historicaldata(t,f,startDate,endDate) 
       datesArray = list(temp.index) 
       tArray = [t for i in range(len(datesArray))] 
       arrays = [tArray, datesArray] 
       tuples = list(zip(*arrays)) 
       index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE']) 


       temp = pd.DataFrame({f : temp[f].values}, index=index) 

       #df = df.append(temp, ignore_index = True) 
       df = pd.concat([df, temp]).sortlevel() 

而且结果:

     OPEN_INT PX_LAST 
TICKER  DATE       
EDH8 COMDTY 2017-02-01  NaN 98.365 
      2017-02-01 1008044.0  NaN 
      2017-02-02  NaN 98.370 
      2017-02-02 1009994.0  NaN 
      2017-02-03  NaN 98.360 
      2017-02-03 1019181.0  NaN 
      2017-02-06  NaN 98.405 
      2017-02-06 1023863.0  NaN 
      2017-02-07  NaN 98.410 
      2017-02-07 1024609.0  NaN 
      2017-02-08  NaN 98.435 
      2017-02-08 1046258.0  NaN 
      2017-02-09  NaN 98.395 

本质上想要得到它,所以没有NaN的!

编辑:添加“轴= 1”到CONCAT导致以下(我的错不包括在firstplace额外的输出)

     PX_LAST OPEN_INT PX_LAST OPEN_INT PX_LAST \ 
TICKER  DATE               
EDH8 COMDTY 2017-02-01 98.365 1008044.0  NaN  NaN  NaN 
      2017-02-02 98.370 1009994.0  NaN  NaN  NaN 
      2017-02-03 98.360 1019181.0  NaN  NaN  NaN 
      2017-02-06 98.405 1023863.0  NaN  NaN  NaN 
      2017-02-07 98.410 1024609.0  NaN  NaN  NaN 
      2017-02-08 98.435 1046258.0  NaN  NaN  NaN 
      2017-02-09 98.395 1050291.0  NaN  NaN  NaN 
EDM8 COMDTY 2017-02-01  NaN  NaN 98.245 726739.0  NaN 
      2017-02-02  NaN  NaN 98.250 715081.0  NaN 
      2017-02-03  NaN  NaN 98.235 723936.0  NaN 
      2017-02-06  NaN  NaN 98.285 729324.0  NaN 
      2017-02-07  NaN  NaN 98.295 728673.0  NaN 
      2017-02-08  NaN  NaN 98.325 728520.0  NaN 
      2017-02-09  NaN  NaN 98.280 741840.0  NaN 
EDU8 COMDTY 2017-02-01  NaN  NaN  NaN  NaN 98.130 
      2017-02-02  NaN  NaN  NaN  NaN 98.135 
      2017-02-03  NaN  NaN  NaN  NaN 98.120 
      2017-02-06  NaN  NaN  NaN  NaN 98.180 
      2017-02-07  NaN  NaN  NaN  NaN 98.190 
      2017-02-08  NaN  NaN  NaN  NaN 98.225 
      2017-02-09  NaN  NaN  NaN  NaN 98.175 

谢谢!

回答

1

目前尚不清楚输入格式是什么。

我认为OPEN_INT看起来是这样的:

import datetime 
import pandas as pd 


open_int = pd.DataFrame(
    [ 
     (datetime.date(2017, 2, 1), 1008044.0), 
     (datetime.date(2017, 2, 2), 1009994.0), 
     (datetime.date(2017, 2, 3), 1019181.0), 
     (datetime.date(2017, 2, 6), 1023863.0), 
     (datetime.date(2017, 2, 7), 1024609.0), 
     (datetime.date(2017, 2, 8), 1046258.0), 
    ], 
    columns=['DATE', 'OPEN_INT'] 
) 
open_int['TICKER'] = 'EDH8 COMDTY' 
open_int.set_index(['TICKER', 'DATE'], inplace=True) 

print(open_int) 
#       OPEN_INT 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 1008044.0 
#    2017-02-02 1009994.0 
#    2017-02-03 1019181.0 
#    2017-02-06 1023863.0 
#    2017-02-07 1024609.0 
#    2017-02-08 1046258.0 

而且PX_LAST看起来是这样的:

px_last = pd.DataFrame(
    [ 
     (datetime.date(2017, 2, 1), 98.365), 
     (datetime.date(2017, 2, 2), 98.370), 
     (datetime.date(2017, 2, 3), 98.360), 
     (datetime.date(2017, 2, 6), 98.405), 
     (datetime.date(2017, 2, 7), 98.410), 
     (datetime.date(2017, 2, 8), 98.435), 
     (datetime.date(2017, 2, 9), 98.395), 

    ], 
    columns=['DATE', 'PX_LAST'] 
) 
px_last['TICKER'] = 'EDH8 COMDTY' 
px_last.set_index(['TICKER', 'DATE'], inplace=True) 

print(px_last) 
#       PX_LAST 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 98.365 
#    2017-02-02 98.370 
#    2017-02-03 98.360 
#    2017-02-06 98.405 
#    2017-02-07 98.410 
#    2017-02-08 98.435 
#    2017-02-09 98.395 

然后你Concat的他们,并得到你想要的东西:

df = pd.concat([open_int, px_last], axis=1) 
print(df) 
#       OPEN_INT PX_LAST 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 1008044.0 98.365 
#    2017-02-02 1009994.0 98.370 
#    2017-02-03 1019181.0 98.360 
#    2017-02-06 1023863.0 98.405 
#    2017-02-07 1024609.0 98.410 
#    2017-02-08 1046258.0 98.435 
#    2017-02-09  NaN 98.395 
+0

嗨 - 感谢您的回复。这导致另一个问题不幸。编辑上面 – keynesiancross

1

你需要沿另一个轴串连:

pd.concat([df, temp], axis=1) 

默认情况下,熊猫串接行和列对齐,从而导致你看到的结果。

+0

嗨 - 感谢您的回复。这导致另一个问题不幸。上面编辑 – keynesiancross