4
我有如下一个熊猫数据框的名称:熊猫:获得最小列
incomplete_df = pd.DataFrame({'event1': [1, 2 ,np.NAN,5 ,6,np.NAN,np.NAN,11 ,np.NAN,15],
'event2': [np.NAN,1 ,np.NAN,3 ,4,7 ,np.NAN,12 ,np.NAN,17],
'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4 ,9 ,np.NAN,3 ,np.NAN]})
incomplete_df
event1 event2 event3
0 1 NaN NaN
1 2 1 NaN
2 NaN NaN NaN
3 5 3 NaN
4 6 4 6
5 NaN 7 4
6 NaN NaN 9
7 11 12 NaN
8 NaN NaN 3
9 15 17 NaN
我想追加一个reason
列,给出了一个标准的文本+ 的,最低值的列名行。换句话说,期望的输出是:
event1 event2 event3 reason
0 1 NaN NaN 'Reason is event1'
1 2 1 NaN 'Reason is event2'
2 NaN NaN NaN 'Reason is None'
3 5 3 NaN 'Reason is event2'
4 6 4 6 'Reason is event2'
5 NaN 7 4 'Reason is event3'
6 NaN NaN 9 'Reason is event3'
7 11 12 NaN 'Reason is event1'
8 NaN NaN 3 'Reason is event3'
9 15 17 NaN 'Reason is event1'
我能做到incomplete_df.apply(lambda x: min(x),axis=1)
但这并不忽视NAN
的,更重要的返回值,而不是对应的列名。
编辑:
已经发现了从EMS的答案idxmin()函数,我计时了以下两种解决方案:
timeit.repeat("incomplete_df.apply(lambda x: x.idxmin(), axis=1)", "from __main__ import incomplete_df", number=1000)
[0.35261858807214175, 0.32040155511039536, 0.3186818508661702]
timeit.repeat("incomplete_df.T.idxmin()", "from __main__ import incomplete_df", number=1000)
[0.17752145781657447, 0.1628651645393262, 0.15563708275042387]
好像转置的方法是快两倍。
太棒了!一个简短的子问题:有没有办法让'min()'排除NaN的?类似于R的'na.rm = True'参数。 – Rhubarb
min(skipna = True)''是默认的 – Jeff
感谢'idxmin()'。我最终实现了这个:'incomplete_df ['Reason'] = incomplete_df.apply(lambda x:x.idxmin(),axis = 1)' – Rhubarb