2017-05-06 47 views
2

我有DF:熊猫意味着(),用于多指标

CU   Parameters   1  2  3 
379-H Output Energy, (Wh/h) 0.045 0.055 0.042 
349-J Output Energy, (Wh/h) 0.001 0.003 0 
625-H Output Energy, (Wh/h) 2.695 1.224 1.272 
626-F Output Energy, (Wh/h) 1.381 1.494 1.3 

我想创建两个单独的DFS,通过在0电平(CU)分组索引获取列值的平均值:

DF1 :(379-H和625-H)

Parameters    1  2  3 
Output Energy, (Wh/h) 1.37 0.63 0.657 

DF2:(其余)

Parameters     1  2  3 
Output Energy, (Wh/h)  0.69 0.74 0.65 

我可以通过分组1平获得平均为所有使用:

df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all').groupby(level=1).mean() 

,但我怎么这些按0级组?

SOLUTION:

lightsonly = ["379-H", "625-H"] 
df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all') 
mask = df.index.get_level_values(0).isin(lightsonly) 
df1 = df[mask].groupby(level=1).mean() 
df2 = df[~mask].groupby(level=1).mean() 

回答

2

使用get_level_values + isinTrueFalse指数,然后得到meanrenamedict

d = {True: '379-H and 625-H', False: 'the rest'} 
df.index = df.index.get_level_values(0).isin(['379-H', '625-H']) 
df = df.mean(level=0).rename(d) 
print (df) 
        1  2  3 
the rest   0.691 0.7485 0.650 
379-H and 625-H 1.370 0.6395 0.657 

对于单独dfs是可能还使用boolean indexing

mask= df.index.get_level_values(0).isin(['379-H', '625-H']) 

df1 = df[mask].mean().rename('379-H and 625-H').to_frame().T 
print (df1) 
        1  2  3 
379-H and 625-H 1.37 0.6395 0.657 

df2 = df[~mask].mean().rename('the rest').to_frame().T 
print (df2) 
       1  2  3 
the rest 0.691 0.7485 0.65 

另一个numpy溶液用DataFrame构造:

a1 = df[mask].values.mean(axis=0) 
#alternatively 
#a1 = df.values[mask].mean(axis=0) 
df1 = pd.DataFrame(a1.reshape(-1, len(a1)), index=['379-H and 625-H'], columns=df.columns) 
print (df1) 
        1  2  3 
379-H and 625-H 1.37 0.6395 0.657 
+0

编辑的溶液,使用上述 – wazzahenry

+0

高兴布尔索引可以帮助,我想'DF1 = DF [掩模] .groupby(等级= 1).mean()'是相同作为'df1 = df [mask] .mean(level = 1)'。美好的一天! – jezrael

2

考虑这样CUParameters被认为是在索引数据帧df

        1  2  3 
CU Parameters         
379-H Output Energy, (Wh/h) 0.045 0.055 0.042 
349-J Output Energy, (Wh/h) 0.001 0.003 0.000 
625-H Output Energy, (Wh/h) 2.695 1.224 1.272 
626-F Output Energy, (Wh/h) 1.381 1.494 1.300 

然后我们就可以GROUPBY的第一级值是否在列表['379-H', '625-H']的真值。

m = {True: 'Main', False: 'Rest'} 
l = ['379-H', '625-H'] 
g = df.index.get_level_values('CU').isin(l) 
df.groupby(g).mean().rename(index=m) 

      1  2  3 
Rest 0.691 0.7485 0.650 
Main 1.370 0.6395 0.657 
1
#Use a lambda function to change index to 2 groups and then groupby using the modified index. 
df.groupby(by=lambda x:'379-H,625-H' if x[0] in ['379-H','625-H'] else 'Others').mean() 
Out[22]: 
       1  2  3 
379-H,625-H 1.370 0.6395 0.657 
Others  0.691 0.7485 0.650