2015-04-07 107 views
1

选择我有以下的数据帧,存储在一个HDFStore对象作为frame_table称为数据:熊猫HDFStore从嵌套列

 shipmentid qty    
catid    1 2 3 4 5 
0    0 0 0 0 0 0 
1    1 0 0 0 2 0 
2    2 2 0 0 0 0 
3    3 0 4 0 0 0 
0    0 0 0 0 0 0 

我想做store.select('data','shipmentid==2'),但我得到的错误“shipmentid”没有定义:

ValueError: The passed where expression: shipmentid==2 
      contains an invalid variable reference 
      all of the variable refrences must be a reference to 
      an axis (e.g. 'index' or 'columns'), or a data_column 
      The currently defined references are: columns,index 

什么是写这个选择语句的正确方法?

编辑:添加代码示例

import pandas as pd 
from pandas import * 
import random 

def createFrame(): 
    data = { 
      ('shipmentid',''):{1:1,2:2,3:3}, 
      ('qty',1):{1:5,2:5,3:5}, 
      ('qty',2):{1:6,2:6,3:6}, 
      ('qty',3):{1:7,2:7,3:7} 
      } 
    frame = pd.DataFrame(data) 

    return frame 

def createStore(): 
    store = pd.HDFStore('sample.h5',format='table') 
    return store  

frame = createFrame() 
print(frame) 
print('\n') 
print(frame.info()) 

store = createStore() 
store.put('data',frame,format='t') 
print('\n') 
print(store) 

results = store.select('data','shipmentid == 2') 

store.close() 

回答

3

我敢打赌你使用这样的事情来创建你的店,

In [207]: 

data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty']) 
store = pd.HDFStore('borrar') 
store.put('data', data, format='t') 

如果再尝试做一个select确实是你你描述的错误,

In [208]: 

store.select('data', 'shipmentid>0') 

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-211-5d0c4082cdcf> in <module>() 
----> 1 store.select('data', 'shipmentid>0') 

... 

ValueError: The passed where expression: shipmentid>0 
      contains an invalid variable reference 
      all of the variable refrences must be a reference to 

而是,你可以这样创建它:

In [209]: 

data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty']) 
data.to_hdf('borrar2', 'data', append=True, mode='w', data_columns=['shipmentid', 'qty']) 
In [210]: 

pd.read_hdf('borrar2', 'data', where='shipmentid>0') 
Out[210]: 
shipmentid qty 
1 0.778225 -1.008529 
5 0.264075 -0.651268 
7 0.908880 0.153306 

(老实说,我不知道为什么它的工作的一种方式,另一种则没有,我的猜测是,在第一个1,你不能指定的数据列。但是,这些东西可以让你发疯......)

编辑: 的代码更新发布后,数据帧有MultiIndex。类似的更新的代码会是这样的:

In [273]: 

import pandas as pd 
from pandas import * 
import random 

def createFrame(): 
    data = { 
      ('shipmentid',''):{1:1,2:2,3:3}, 
      ('qty',1):{1:5,2:5,3:5}, 
      ('qty',2):{1:6,2:6,3:6}, 
      ('qty',3):{1:7,2:7,3:7} 
      } 
    frame = pd.DataFrame(data) 

    return frame 

frame = createFrame() 
print(frame) 
print('\n') 
print(frame.info()) 

frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table') 
pd.read_hdf('sample.h5','data', 'shipmentid == 2') 

但我得到一个错误(我猜你会得到相同的):

qty  shipmentid 
    1 2 3   
1 5 6 7   1 
2 5 6 7   2 
3 5 6 7   3 


<class 'pandas.core.frame.DataFrame'> 
Int64Index: 3 entries, 1 to 3 
Data columns (total 4 columns): 
(qty, 1)   3 non-null int64 
(qty, 2)   3 non-null int64 
(qty, 3)   3 non-null int64 
(shipmentid,) 3 non-null int64 
dtypes: int64(4) 
memory usage: 120.0 bytes 
None 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-273-e10e811fc7c0> in <module>() 
    23 print(frame.info()) 
    24 
---> 25 frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table') 
    26 pd.read_hdf('sample.h5','data', 'shipmentid == 2') 
..... 
stack trace 
..... 
ValueError: cannot use a multi-index on axis [1] with data_columns ['shipmentid'] 

我已经浏览了一下,我不能提供一个解决方案为了这。我的印象是通过查看code in github是否选项data_columns不能与MultiIndex组合使用。我能想到的唯一解决方案就是写入HDFStore(与您的代码一样),然后阅读完整的数据框,无条件地执行搜索后续处理。那就是:

new_frame = store.get('data') 
print new_frame[new_frame['shipmentid'] == 2] 



<class 'pandas.io.pytables.HDFStore'> 
File path: sample.h5 
/data   frame_table (typ->appendable,nrows->3,ncols->4,indexers->[index]) 
    qty  shipmentid 
    1 2 3   
2 5 6 7   2 
+0

该问题似乎源于使用嵌套列。看到我刚刚添加的完整示例代码。 – TraxusIV

+0

更新了答案,但可能不再是答案。无论如何希望它有帮助 – lrnzcig