2015-10-20 115 views
0

我有一个包含multiindex的数据框。我需要根据架构和/或脚本使用各种数据子集(索引是schemascript)。数据框看起来是这样的:通过MultiIndex检索数据

      tx_id step step_id   start_time              
schema_10 cmc_v2_file  19-3 10  279 2015-09-04 00:46:30 
      cmc_v2_file  2-7 10  423 2015-09-04 00:46:22 
      cmc_v2_file  29-1 10  20 2015-09-04 00:46:34 
      cmc_v2_file  35-1  4  63 2015-09-04 00:46:51 
      cmc_v2_file  31-2 10  79 2015-09-04 00:46:54 
      cmc_v2_file  5-8 10  536 2015-09-04 00:46:57 
      cmc_v2_file  5-9 10  610 2015-09-04 00:47:13 
      cmc_v2_file  39-1 10  178 2015-09-04 00:47:12 
      cmc_v2_file  41-1 10  211 2015-09-04 00:47:22 
      cmc_v2_file  21-4 10  678 2015-09-04 00:47:28 
      cmc_v2_file  23-4 10  698 2015-09-04 00:47:31 
      cmc_v2_file  31-5 10  399 2015-09-04 00:47:45 
      cmc_v2_file  35-4  3  453 2015-09-04 00:47:54 
      cmc_v2_file  29-5  4  461 2015-09-04 00:47:54 
      cmc_v2_file  29-5  8  465 2015-09-04 00:47:55 
      cmc_v2_file  42-3  1  467 2015-09-04 00:47:57 
      cmc_v2_file  22-5  8  866 2015-09-04 00:47:53 
      cmc_v2_file  16-6  8  893 2015-09-04 00:47:51 
      cmc_v2_file  17-6  4  938 2015-09-04 00:47:54 
      cmc_v2_file  17-6  8  942 2015-09-04 00:47:55 
      cmc_v2_file  6-2 10  707 2015-09-04 00:47:50 
      cmc_v2_file  4-11 10  730 2015-09-04 00:47:54 
      cmc_v2_file  6-3  2  745 2015-09-04 00:47:53 
      cmc_v2_file  5-11  1  762 2015-09-04 00:47:55 
      cmc_v2_file  4-12  1  763 2015-09-04 00:47:56 
      cmc_v2_file  5-12 10  782 2015-09-04 00:48:16 
      cmc_v2_file  31-6  4  471 2015-09-04 00:47:55 
      cmc_v2_file  38-3  4  520 2015-09-04 00:47:51 
      cmc_v2_file  39-3  4  551 2015-09-04 00:47:55 
      cmc_v2_file  31-7 10  570 2015-09-04 00:48:20 
...       ... ...  ...     ... 
schema_9 hcs-vbu  1332-132 14 197542 2015-09-04 00:29:46 
      hcs-vbu  515-143  5 196309 2015-09-04 00:29:01 
      hcs-vbu  552-126 13 196333 2015-09-04 00:29:19 
      hcs-vbu  559-116 12 197068 2015-09-04 00:29:33 
      hcs-vbu  566-115 13 197201 2015-09-04 00:29:47 
      hcs-vbu  523-152  3 197443 2015-09-04 00:29:33 
      hcs-vbu  790-136  2 200774 2015-09-04 00:28:46 
      hcs-vbu  790-136  4 200776 2015-09-04 00:28:56 
      hcs-vbu  790-136 12 200784 2015-09-04 00:29:13 
      hcs-vbu  206-148  5 198213 2015-09-04 00:29:04 

为了获取数据特定脚本我这样做:

df.loc(axis=0)[:,[script]] 

,当我打印出整个数据帧,它看起来是正确的。问题是,我也写了这一切,并为测试的一部分,一个单元测试,我想验证数据只包含一个脚本:

scripts = df.index.levels[df.index.names.index('script')] 

然而,而不是返回像一个列表我预计我会得到一个6的列表,这是原始未过滤数据中的脚本数量。通过调用.loc筛选数据框后,是否有另外一种方法可以检索脚本索引?

回答

0

您的第二条陈述df.index.levels获取索引中的所有级别。然后,通过说,将第二个多索引(称为“脚本”)中的所有关卡给我。

我想你想要的是类似这样的东西,对于名为'script'的索引,给我一个特定的值。

## here we set a specific value you want to filter with 

specific_script_value = cmc_v2_file 

## and then we filter in the second dimension of the index. 
## The indexer helps slice in several dimensions 

idx=pd.IndexSlice 
df.loc[idx[:,specific_script_value],:]