2017-04-10 59 views
1

我有大量的熊猫数据框,具有完全相同的键和列名称。他们有数据如下:总结大量的数据帧

z1.ix[0] 
val1  [1, 5, 3, 4] 
val2  47 
Name: 2017-01-01 01:00:00, dtype: object 

z2.ix[0] 
val1  [11, 5, 53, 5] 
val2  4 
Name: 2017-01-01 01:00:00, dtype: object 

z3.ix[0] 
val1  [1, 25, 3, 4] 
val2  7 
Name: 2017-01-01 01:00:00, dtype: object 

我试过如下:

summedDf = z1 + z2 + z3 

这给了以下内容:

summedDf.ix[0] 
val1  [1, 5, 3, 4, 11, 5, 53, 5, 1, 25, 3, 4] 
val2  58 
Name: 2017-01-01 01:00:00, dtype: object 

但是我希望能实现,而不是以下:

summedDf.ix[0] 
val1  [13, 35, 59, 13] 
val2  58 
Name: 2017-01-01 01:00:00, dtype: object 

另外,如何我是否将上述添加扩展到约500个数据框?

编辑: val1val2是不同的列名称。 val1商店列表和val2存储每个索引的值。

+0

我想你可以连接成一个'df'然后沿轴线使用df.sum。 – Divakar

+0

这些列表是否存储在列中?或者对每个* val1 *项目执行* val2 *重复操作?请显示全画幅而不是切片。 – Parfait

回答

0

可能不是最有效的,但将让你开始:

import pandas as pd 
import numpy as np 


# gen test data 
df1 = pd.DataFrame({'val1':[[1,2,3],[4,5,6]], 'val2': [1,2]}) 
df1 

给人,

val1  val2 
0 [1, 2, 3] 1 
1 [4, 5, 6] 2 

另一个数据框:

def check(x): 
    if isinstance(x, list): 
     output = [i * 2 for i in x] 
    else: 
     output = x*2 
    return output 

df2 = df1.applymap(lambda x: check(x)) 
df2 

给人,

val1  val2 
0 [2, 4, 6] 2 
1 [8, 10, 12] 4 

添加数据帧:

def add_cols(df1, df2, col): 
    if isinstance(df1[col][0], list): 
     df1[col] = df1[col].apply(lambda x: np.array(x)) 
     df2[col] = df2[col].apply(lambda x: np.array(x)) 
    return df1[col].add(df2[col]) 


def add_dfs(df1, df2): 
    for c in df1.columns: 
     df1.loc[:,c] = add_cols(df1, df2, c) 
    return df1 


# you can use a generator to read dataframes on the fly 
# instead of loading all into a list 
dfs = [df1, df2] 


for e, df in enumerate(dfs): 
    if e == 0: 
     df_sum = df.copy() 
    else: 
     df_sum = add_dfs(df1, df2) 

给出所需的输出:

val1   val2 
0 [5, 10, 15]  5 
1 [20, 25, 30] 10