2015-08-31 223 views
1

我有一系列的csv,我加载到数据框和存储在列表(dataframesArray)。列表和dataframes看起来像如下:Python熊猫合并或concat数据帧

dataframesArray [    
    BBG.XAMS.UL.S_pnl_pos_cost 
     date         
     2015-03-23     0.000000 
     2015-03-24     0.000000 
     2015-03-25     -0.674717 
     2015-03-26     69.140999 
     2015-03-27     -70.128728,    
    BBG.XAMS.UNA.S_pnl_pos_cost 
     date         
     2015-03-23     -0.674929 
     2015-03-24     -15.138444 
     2015-03-25     90.830662 
     2015-03-26     21.446129 
     2015-03-27     -2.554376,    
    BBG.XAMS.UL.S_pnl_pos_cost 
     date         
     2014-10-20     -15.220730 
     2014-10-21     3031.610010 
     2014-10-22     1976.815412 
     2014-10-23    -2974.037294 
     2014-10-24     796.775000, 
    BBG.XAMS.UNA.S_pnl_pos_cost 
     date         
     2014-10-20     -4.140378 
     2014-10-21     618.064066 
     2014-10-22     -71.104800 
     2014-10-23     828.063647 
     2014-10-24      0.000000] 

的数据是2个产品(BBG.XAMS.UL.S_pnl_pos_cost和BBG.XAMS.UNA.S_pnl_pos_cost)按日期,在未来会有更多产品。我想Concat的或合并(不知道哪个)dataframes列表到一个数据帧(所谓的结果),因此它们看起来像:

  BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost date                 
2014-10-20     -15.220730      -4.140378 
2014-10-21    3031.610010     618.064066 
2014-10-22    1976.815412     -71.104800 
2014-10-23    -2974.037294     828.063647 
2014-10-24     796.775000      0.000000 
2015-03-23     0.000000     -0.674929 
2015-03-24     0.000000     -15.138444 
2015-03-25     -0.674717     90.830662 
2015-03-26     69.140999     21.446129 
2015-03-27     -70.128728     -2.554376 

我想用下面这样做:

result = pd.concat(dataframesArray,axis=1) 

其中axis是日期。它看起来像数据按日期合并,但我错过了2015-03-23开始的一周的数据。我现在的CONCAT结果数据框的样子:

BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost 
date                 
2014-10-20     -15.220730     -4.140378 
2014-10-21     3031.610010     618.064066 
2014-10-22     1976.815412     -71.104800 
2014-10-23    -2974.037294     828.063647 
2014-10-24     796.775000      0.000000 
2015-03-23       NaN       NaN 
2015-03-24       NaN       NaN 
2015-03-25       NaN       NaN 
2015-03-26       NaN       NaN 
2015-03-27       NaN       NaN 

我目前的代码是:

stockPricesDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,usecols=(0,3)) 

       stockPricesDf.rename(columns={'adjusted_last_acc': row},inplace=True)  

       dataframesArray.append(stockPricesDf) 

       result = pd.concat(dataframesArray,axis=1) 

我循环尽管一些目录获取存储在CSV文件中的产品数据。

可能有人请让我知道我做错了,以及如何解决

非常感谢

+0

尝试使用axis = 0。如果每个数据帧具有相同的列名,则这应该按列逐列进行连接。 – Maximus

+0

[Pandas join/merge/concat two dataframes]可能的重复(http://stackoverflow.com/questions/11637384/pandas-join-merge-concat-two-dataframes) –

回答

2

试试这个:

result = pd.concat(dataframesArray, axis=1) # like you did 
result = result.groupby(result.columns, axis=1).sum() 

如您所见,第一步做到这一点(编号):

    UL  UNA  UL  UNA 
2015-03-23 2.169534 0.294107  NaN  NaN 
2015-03-24 -0.077550 -0.758760  NaN  NaN 
2015-03-25 0.159659 -3.167541  NaN  NaN 
2015-03-26 0.895535 0.944644  NaN  NaN 
2015-03-27 -0.385408 -0.005069  NaN  NaN 
2015-10-20  NaN  NaN 1.855446 -0.229635 
2015-10-21  NaN  NaN -0.400450 -0.237323 
2015-10-22  NaN  NaN 1.103165 0.718134 
2015-10-23  NaN  NaN -0.157415 1.119828 
2015-10-24  NaN  NaN -0.016321 -0.371061 

第二步将分组名称的列分组到单列:

    UL  UNA 
2015-03-23 2.169534 0.294107 
2015-03-24 -0.077550 -0.758760 
2015-03-25 0.159659 -3.167541 
2015-03-26 0.895535 0.944644 
2015-03-27 -0.385408 -0.005069 
2015-10-20 1.855446 -0.229635 
2015-10-21 -0.400450 -0.237323 
2015-10-22 1.103165 0.718134 
2015-10-23 -0.157415 1.119828 
2015-10-24 -0.016321 -0.371061 
+0

谢谢Ian,那个点击 – Stacey