2013-02-06 53 views
8

我试图在每个时间戳中找到数据框中的列名,该数据框的值与时间序列中相同时间戳的值匹配。获取列名称,其中的值是熊猫数据框中的某个值

这里是我的数据框:

>>> df 
          col5  col4  col3  col2  col1 
1979-01-01 00:00:00 1181.220328 912.154923 648.848635 390.986156 138.185861 
1979-01-01 06:00:00 1190.724461 920.767974 657.099560 399.395338 147.761352 
1979-01-01 12:00:00 1193.414510 918.121482 648.558837 384.632475 126.254342 
1979-01-01 18:00:00 1171.670276 897.585930 629.201469 366.652033 109.545607 
1979-01-02 00:00:00 1168.892579 900.375126 638.377583 382.584568 132.998706 

>>> df.to_dict() 
{'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}} 

和我要以匹配各时间戳值的时间序列:

>>> ts 
1979-01-01 00:00:00 1181.220328 
1979-01-01 06:00:00 657.099560 
1979-01-01 12:00:00 126.254342 
1979-01-01 18:00:00 109.545607 
Freq: 6H 

>>> ts.to_dict() 
{<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377} 

那么结果将是:

>>> df_result 
          value Column 
1979-01-01 00:00:00 1181.220328 col5 
1979-01-01 06:00:00 657.099560 col3 
1979-01-01 12:00:00 126.254342 col1 
1979-01-01 18:00:00 109.545607 col1 

我希望我的问题很清楚。任何人有一个想法如何获得df_result?

感谢

格雷格

回答

4

这里是一个,也许不雅,办法做到这一点:

df_result = pd.DataFrame(ts, columns=['value']) 

设置其抓住其中包含值的列名(从ts)函数:

def get_col_name(row):  
    b = (df.ix[row.name] == row['value']) 
    return b.index[b.argmax()] 

for each行,测试哪些元素等于该值,并提取True的列名称。

而且apply它(按行):

In [3]: df_result.apply(get_col_name, axis=1) 
Out[3]: 
1979-01-01 00:00:00 col5 
1979-01-01 06:00:00 col3 
1979-01-01 12:00:00 col1 
1979-01-01 18:00:00 col1 

即使用df_result['Column'] = df_result.apply(get_col_name, axis=1)

注:有相当多的get_col_name怎么回事,这样也许值得一些进一步的解释:

In [4]: row = df_result.irow(0) # an example row to pass to get_col_name 

In [5]: row 
Out[5]: 
value 1181.220328 
Name: 1979-01-01 00:00:00 

In [6]: row.name # use to get rows of df 
Out[6]: <Timestamp: 1979-01-01 00:00:00> 

In [7]: df.ix[row.name] 
Out[7]: 
col5 1181.220328 
col4  912.154923 
col3  648.848635 
col2  390.986156 
col1  138.185861 
Name: 1979-01-01 00:00:00 

In [8]: b = (df.ix[row.name] == row['value']) 
     #checks whether each elements equal row['value'] = 1181.220328 

In [9]: b 
Out[9]: 
col5  True 
col4 False 
col3 False 
col2 False 
col1 False 
Name: 1979-01-01 00:00:00 

In [10]: b.argmax() # index of a True value 
Out[10]: 0 

In [11]: b.index[b.argmax()] # the index value (column name) 
Out[11]: 'col5' 

它可能有更有效的方式做到这一点...

+0

谢谢@Andy,它工作。 – leroygr

3

继Andy的详细解答之后,选择每行最高值的列名称的解决方案可简化为单行:

df['column'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1) 
相关问题