2016-04-27 65 views
1

我有一个2000行的数据框,我试图将同一个数据框分成两部分并将它们组合在一起。H2O python rbind error

t1 = test[:10, :] 
t2 = test[20:, :] 
temp = t1.rbind(t2) 
temp.show() 

然后我得到这个错误:

--------------------------------------------------------------------------- 
EnvironmentError       Traceback (most recent call last) 
<ipython-input-37-8daeb3375743> in <module>() 
     2 t2 = test[20:, :] 
     3 temp = t1.rbind(t2) 
----> 4 temp.show() 
     5 print len(temp) 
     6 print len(test) 

/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in show(self, use_pandas) 
    383  print("This H2OFrame has been removed.") 
    384  return 
--> 385  if not self._ex._cache.is_valid(): self._frame()._ex._cache.fill() 
    386  if H2ODisplay._in_ipy(): 
    387  import IPython.display 

/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in _frame(self, fill_cache) 
    423 
    424 def _frame(self, fill_cache=False): 
--> 425  self._ex._eager_frame() 
    426  if fill_cache: 
    427  self._ex._cache.fill() 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eager_frame(self) 
    67  if not self._cache.is_empty(): return self 
    68  if self._cache._id is not None: return self # Data already computed under ID, but not cached locally 
---> 69  return self._eval_driver(True) 
    70 
    71 def _eager_scalar(self): # returns a scalar (or a list of scalars) 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eval_driver(self, top) 
    81 def _eval_driver(self, top): 
    82  exec_str = self._do_it(top) 
---> 83  res = ExprNode.rapids(exec_str) 
    84  if 'scalar' in res: 
    85  if isinstance(res['scalar'], list): self._cache._data = [float(x) for x in res['scalar']] 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in rapids(expr) 
    163  The JSON response (as a python dictionary) of the Rapids execution 
    164  """ 
--> 165  return H2OConnection.post_json("Rapids", ast=expr,session_id=H2OConnection.session_id(), _rest_version=99) 
    166 
    167 class ASTId: 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in post_json(url_suffix, file_upload_info, **kwargs) 
    515  if __H2OCONN__ is None: 
    516  raise ValueError("No h2o connection. Did you run `h2o.init()` ?") 
--> 517  return __H2OCONN__._rest_json(url_suffix, "POST", file_upload_info, **kwargs) 
    518 
    519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs): 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _rest_json(self, url_suffix, method, file_upload_info, **kwargs) 
    518 
    519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs): 
--> 520  raw_txt = self._do_raw_rest(url_suffix, method, file_upload_info, **kwargs) 
    521  return self._process_tables(raw_txt.json()) 
    522 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _do_raw_rest(self, url_suffix, method, file_upload_info, **kwargs) 
    592  raise EnvironmentError(("h2o-py got an unexpected HTTP status code:\n {} {} (method = {}; url = {}). \n"+ \ 
    593        "detailed error messages: {}") 
--> 594        .format(http_result.status_code,http_result.reason,method,url,detailed_error_msgs)) 
    595 
    596 

EnvironmentError: h2o-py got an unexpected HTTP status code: 
500 Server Error (method = POST; url = http://localhost:54321/99/Rapids). 
detailed error messages: [] 

如果我计算行(LEN(TEMP)),它的工作原理找到。另外,如果我稍微改变切片索引,它也可以找到。例如,如果我更改为此,它会显示数据框。

t1 = test[:10, :] 
t2 = test[:5, :] 

我在这里想念什么吗?谢谢。

回答

0

不清楚发生了什么,没有更多的信息(日志可能会说,为什么没有采取)。

您使用的是什么版本?我在虹膜的边缘尝试了你的代码,这一切都按预期工作。

顺便说一句,rbind通常将是昂贵的,特别是因为你在语义上后在做什么是一个子集:

test[range(10) + range(20,test.nrow),:]

也应该给你想要的子集(与有条件的,就是你做python中的行索引的完整列表,并将其通过REST传递给h2o)。

+0

嗨,谢谢你的回答,版本是3.8.1.4。它符合你的建议。我最初的想法是实现k-fold功能,我是新手,我想知道你是否知道如何有效地做到这一点。谢谢。 – hamuchiwa

+0

你不需要实现你自己的k-fold函数,H2O已经使用'nfolds'参数进行了交叉验证。看看这个笔记本的例子:https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/H2O_tutorial_eeg_eyestate.ipynb –

+0

我明白nfolds的论点。我想要做的是在GLM模型中同时设置lambda_search = True和nfolds = 3。似乎不会让我这样做。为了避免在lambda上进行手动网格搜索,我决定实现k-fold函数。它听起来像一个正确的方式?非常感谢。 – hamuchiwa