2014-01-17 89 views
4

我有如下一个熊猫数据框的名称:熊猫:获得最小列

incomplete_df = pd.DataFrame({'event1': [1,  2  ,np.NAN,5  ,6,np.NAN,np.NAN,11 ,np.NAN,15], 
           'event2': [np.NAN,1  ,np.NAN,3  ,4,7  ,np.NAN,12 ,np.NAN,17], 
           'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4  ,9  ,np.NAN,3  ,np.NAN]}) 
incomplete_df 
    event1 event2 event3 
0  1  NaN  NaN 
1  2  1  NaN 
2  NaN  NaN  NaN 
3  5  3  NaN 
4  6  4  6 
5  NaN  7  4 
6  NaN  NaN  9 
7  11  12  NaN 
8  NaN  NaN  3 
9  15  17  NaN 

我想追加一个reason列,给出了一个标准的文本+ 的,最低值的列名行。换句话说,期望的输出是:

event1 event2 event3 reason 
0  1  NaN  NaN 'Reason is event1' 
1  2  1  NaN 'Reason is event2' 
2  NaN  NaN  NaN 'Reason is None' 
3  5  3  NaN 'Reason is event2' 
4  6  4  6 'Reason is event2' 
5  NaN  7  4 'Reason is event3' 
6  NaN  NaN  9 'Reason is event3' 
7  11  12  NaN 'Reason is event1' 
8  NaN  NaN  3 'Reason is event3' 
9  15  17  NaN 'Reason is event1' 

我能做到incomplete_df.apply(lambda x: min(x),axis=1)但这并不忽视NAN的,更重要的返回值,而不是对应的列名。

编辑:

已经发现了从EMS的答案idxmin()函数,我计时了以下两种解决方案:

timeit.repeat("incomplete_df.apply(lambda x: x.idxmin(), axis=1)", "from __main__ import incomplete_df", number=1000) 
[0.35261858807214175, 0.32040155511039536, 0.3186818508661702] 

timeit.repeat("incomplete_df.T.idxmin()", "from __main__ import incomplete_df", number=1000) 
[0.17752145781657447, 0.1628651645393262, 0.15563708275042387] 

好像转置的方法是快两倍。

回答

6
incomplete_df['reason'] = "Reason is " + incomplete_df.T.idxmin() 
+0

太棒了!一个简短的子问题:有没有办法让'min()'排除NaN的?类似于R的'na.rm = True'参数。 – Rhubarb

+1

min(skipna = True)''是默认的 – Jeff

+0

感谢'idxmin()'。我最终实现了这个:'incomplete_df ['Reason'] = incomplete_df.apply(lambda x:x.idxmin(),axis = 1)' – Rhubarb