2017-08-10 35 views
0

我正在pyspark上工作,它安装在一台Ubuntu机器上16.04,现在我需要拉出结果,这是一个很长的代码,结果是一个每次我碰到下面的错误数据帧时,我要保存为CSV文件,并且,一切工作正常,但是,这段代码的最后一行:试图保存一个Pyspark Dataframe,我得到一个Py4JNetworkError - UBUNTU

final_df.write.format('txt').save('final_test1') 

请你的意见,我该怎么办?

ERROR:root:Exception while sending command. 
    Traceback (most recent call last): 
     File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1035, in send_command 
     raise Py4JNetworkError("Answer from Java side is empty") 
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty 

    During handling of the above exception, another exception occurred: 

    Traceback (most recent call last): 
     File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 883, in send_command 
     response = connection.send_command(command) 
     File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1040, in send_command 
     "Error while receiving", e, proto.ERROR_ON_RECEIVE) 
    py4j.protocol.Py4JNetworkError: Error while receiving 
    Traceback (most recent call last): 
     File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock 
     self.process_request(request, client_address) 
     File "/usr/lib/python3.5/socketserver.py", line 341, in process_request 
     self.finish_request(request, client_address) 

    File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request 
    self.RequestHandlerClass(request, client_address, self) 
    File "/usr/lib/python3.5/socketserver.py", line 681, in __init__ 
    self.handle() 
    File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/pyspark/accumulators.py", line 235, in handle 
    num_updates = read_int(self.rfile) 
    File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/pyspark/serializers.py", line 577, in read_int 
    raise EOFError 
EOFError 

--------------------------------------------------------------------------- 
Py4JError         Traceback (most recent call last) 
<ipython-input-22-f56812202624> in <module>() 
     1 final_df.cache() 
----> 2 final_df.write.format('csv').save('final_test1') 

~/spark-2.1.1-bin-hadoop2.7/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options) 
    548    self._jwrite.save() 
    549   else: 
--> 550    self._jwrite.save(path) 
    551 
    552  @since(1.4) 

~/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args) 
    1131   answer = self.gateway_client.send_command(command) 
    1132   return_value = get_return_value(
-> 1133    answer, self.gateway_client, self.target_id, self.name) 
    1134 
    1135   for temp_arg in temp_args: 

~/spark-2.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw) 
    61  def deco(*a, **kw): 
    62   try: 
---> 63    return f(*a, **kw) 
    64   except py4j.protocol.Py4JJavaError as e: 
    65    s = e.java_exception.toString() 

~/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 
    325    raise Py4JError(
    326     "An error occurred while calling {0}{1}{2}". 
--> 327     format(target_id, ".", name)) 
    328  else: 
    329   type = answer[1] 

Py4JError: An error occurred while calling o3911.save 

回答

0

也许你应该试试这个

final_df.write.csv('final_test1.csv') 
+0

它现在,与以前的命令,也这一个。某些csv是空白是正常的吗?它总是带有数百个csv文件,有些是空白的。 – Learner

+0

是的,每个分区单独写入都是正常的。但是如果你想输出一个单独的文件,那么你可以在写入之前在你的数据框上使用'coalesce'(但它不可取,因为它会使你的处理顺序)。试试'final_df.coalesce(1).write.csv('final_test1.csv')' – Prem

相关问题