2016-06-10 138 views
1

我有一个多指标与ID和日期,形式为:从Multiindex中获取密钥?

MultiIndex(levels=[[196003, 196005, 196007, 196009, 196012, 196103, 196105, 196107, 196109, 196112, 196203, 196205, 196207, 196209, 196212, 196303, 196305, 196307, 196309, 196312, 196403, 196405, 196407, 196409, 196412, 201705, 201707, 201709, 201712, 201803, 201805, 201807, 201809, 201812], ['1959-07-01', '1959-07-02', '1959-07-06', '1959-07-07', '1959-07-08', '1959-07-09', '1959-07-10', '1959-07-13', '1959-07-14', '1959-07-15', '1959-07-16', '1959-07-17', '1959-07-20', '1959-07-21', '1959-07-22', '1959-07-23', ...]]) 

两个ID &日期必须唯一指定一行。

我想要做的是提取索引的第一级。

当我做df.index[0],我得到了形式的元组(196003, '1959-07-01')

我要的是形式[196003, 196005, ...]的键级别0

我设法用得到它的一个系列:

list(df[~df['ID'].duplicated()]['ID'].sort_values().reset_index()['ID'])

但我觉得这是一个混乱的缓慢的解决方案。

什么是熊猫的方式?

回答

1

我认为你可以使用get_level_valuesunique

import pandas as pd 

df = pd.DataFrame({'ID':[1,1,3], 
        'Dates':['2015-01-01','2015-01-01','2015-02-01'], 
        'C':[7,8,9]}) 
df['Dates'] = pd.to_datetime(df.Dates) 
df.set_index(['ID', 'Dates'], inplace=True) 
print (df) 
       C 
ID Dates   
1 2015-01-01 7 
    2015-01-01 8 
3 2015-02-01 9 

print (df.index.get_level_values('ID').unique().tolist()) 
[1, 3] 

#another a bit slowier solution 
print (df.index.get_level_values('ID').drop_duplicates().tolist()) 
[1, 3] 

时序

In [134]: %timeit (orig(df1)) 
1000 loops, best of 3: 1.54 ms per loop 

In [138]: %timeit (df.index.get_level_values('ID').unique().tolist()) 
10000 loops, best of 3: 131 µs per loop 

In [139]: %timeit (df.index.get_level_values('ID').drop_duplicates().tolist()) 
10000 loops, best of 3: 182 µs per loop 

代码时序

len(df) = 3k

import pandas as pd 

df = pd.DataFrame({'ID':[1,1,3], 
        'Dates':['2015-01-01','2015-01-01','2015-02-01'], 
        'C':[7,8,9]}) 
df = pd.concat([df]*1000).reset_index(drop=True) 
df['Dates'] = pd.to_datetime(df.Dates) 
df.set_index(['ID', 'Dates'], inplace=True) 
print (df) 


df1 = df.copy() 
df1.reset_index('ID', inplace=True) 

def orig(df): 

    return list(df[~df['ID'].duplicated()]['ID'].sort_values().reset_index()['ID']) 

print (df.index.get_level_values('ID').unique().tolist()) 

print (orig(df1)) 

print (df.index.get_level_values('ID').drop_duplicates().tolist()) 
+0

哇!什么是加速! – cjm2671

+0

谢谢,这太棒了! – cjm2671

+0

很高兴能帮到你! – jezrael