2016-12-14 29 views
0

我使用Apache Toree - PySpark运行Jupyter(v4.2.1)。当我尝试调用plotly的init_notebook_mode功能,我遇到了以下错误:使用Jupyter(Apache Toree PySpark)调用plotly的init_notebook_mode错误

import numpy as np 
import pandas as pd 

import plotly.plotly as py 
import plotly.graph_objs as go 
from plotly import tools 
from plotly.offline import iplot, init_notebook_mode 
init_notebook_mode() 

错误:

Name: org.apache.toree.interpreter.broker.BrokerException 
Message: Traceback (most recent call last): 
    File "/tmp/kernel-PySpark-6415c581-01c4-4c90-b4d9-81773c2bc03f/pyspark_runner.py", line 134, in <module> 
    eval(compiled_code) 
    File "<string>", line 7, in <module> 
    File "/usr/local/lib/python3.4/dist-packages/plotly/offline/offline.py", line 151, in init_notebook_mode 
    display(HTML(script_inject)) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/display.py", line 158, in display 
    format = InteractiveShell.instance().display_formatter.format 
    File "/usr/local/lib/python3.4/dist-packages/traitlets/config/configurable.py", line 412, in instance 
    inst = cls(*args, **kwargs) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 499, in __init__ 
    self.init_io() 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 658, in init_io 
    io.stdout = io.IOStream(sys.stdout) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/utils/io.py", line 34, in __init__ 
    raise ValueError("fallback required, but not specified") 
ValueError: fallback required, but not specified 

StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140) 
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140) 
scala.Option.foreach(Option.scala:236) 
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:139) 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
java.lang.reflect.Method.invoke(Method.java:498) 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
py4j.Gateway.invoke(Gateway.java:259) 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
py4j.commands.CallCommand.execute(CallCommand.java:79) 
py4j.GatewayConnection.run(GatewayConnection.java:209) 
java.lang.Thread.run(Thread.java:745) 

我无法找到这个网络上的任何信息。当我在代码中发现这种情况失败时 - 我在IPython utils中使用io.py,我发现被传递的流必须同时具有这两个属性 - write和flush。但由于某种原因,在这种情况下传递的流 - sys.stdout只有“write”属性,而不是“flush”属性。

+0

是[this](https://github.com/ipython/ipython/issues/9300)链接有帮助吗?它描述了一个错误,其中'IOStream'对象没有'flush'属性,这似乎也是这里的根本原因。 –

回答

0

我相信会发生这种情况,因为plotly的笔记本模式假定它在执行笔记本通信的IPython jupyter内核中运行;你会在堆栈跟踪中看到它试图调用IPython包。

然而,Toree是一个不同的jupyter内核,并有自己的协议处理功能来与笔记本服务器进行通信。即使当你使用toree来运行一个PySpark解释器时,你也会得到一个“普通”的PySpark(就像当你从一个shell启动它时),并且toree驱动该解释器的输入/输出。

因此,IPython机制没有设置,并且在该环境中调用init_notebook_mode()将失败,就像您在PySpark中运行时一样,直接从shell启动的PySpark中,它对笔记本一无所知。

据我所知,目前没有办法通过toree来绘制PySpark会话的输出结果 - 我们最近面临同样的问题。您不需要通过toree运行python,而需要运行IPython内核,将PySpark库导入并连接到Spark群集。请参阅https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook了解码头化示例。