2017-05-29 98 views
2

我想要每个元素除以行的总和,下面的代码总是出错。DataFrame元素按行排序

熊猫新手,谢谢!

df = pd.DataFrame(np.random.rand(12).reshape(3,4),columns=list('abcd')) 
df_row_sum = df.apply(lambda x: x.mean(),axis=1) 
df/df_row_sum 

回答

6

我觉得你DataFrame.div需要sum也许mean每行(axis=1)与分工:

np.random.seed(123) 
df = pd.DataFrame(np.random.randint(10, size=12).reshape(3,4),columns=list('abcd')) 
print (df) 
    a b c d 
0 2 2 6 1 
1 3 9 6 1 
2 0 1 9 0 

print (df.sum(axis=1)) 
0 11 
1 19 
2 10 
dtype: int64 

print (df.div(df.sum(axis=1), axis=0)) 
      a   b   c   d 
0 0.181818 0.181818 0.545455 0.090909 
1 0.157895 0.473684 0.315789 0.052632 
2 0.000000 0.100000 0.900000 0.000000 

print (df.mean(axis=1)) 
0 2.75 
1 4.75 
2 2.50 
dtype: float64 

print (df.div(df.mean(axis=1), axis=0)) 
      a   b   c   d 
0 0.727273 0.727273 2.181818 0.363636 
1 0.631579 1.894737 1.263158 0.210526 
2 0.000000 0.400000 3.600000 0.000000 
0

使用@ jezrael的设置

np.random.seed(123) 
df = pd.DataFrame(np.random.randint(10, size=12).reshape(3,4),columns=list('abcd')) 
print (df) 

    a b c d 
0 2 2 6 1 
1 3 9 6 1 
2 0 1 9 0 

使用numpy并重建新数据帧

v = df.values 
pd.DataFrame(
    v/v.sum(1, keepdims=True), 
    df.index, df.columns 
) 

      a   b   c   d 
0 0.181818 0.181818 0.545455 0.090909 
1 0.157895 0.473684 0.315789 0.052632 
2 0.000000 0.100000 0.900000 0.000000