hadoop streaming中python子进程的输出文件在哪里

我正在使用hadoop流式运行python子进程运行C++可执行文件（一种称为blast的生物信息学软件）。在命令行上执行时，Blast会输出一个结果文件。但是在hadoop上运行时，我找不到blast的输出文件。我想知道，输出文件在哪里？hadoop streaming中python子进程的输出文件在哪里

我的代码（map.py）是如下：

# path used on hadoop 
tool = './blastx' 
reference_path = 'Reference.fa' 

# input format example 

# >LW1   (contig name) 
# ATCGATCGATCG (sequence) 

# samile file: https://goo.gl/XTauAx 

(name, seq) = (None, None) 

for line in sys.stdin: 

    # when detact the ">" sign, assign contig name 
    if line[0] == '>': 
     name = line.strip()[1:] 

    # otherwise, assign the sequence 
    else: 
     seq = line.strip() 

     if name and seq: 

      # assign the path of output file 
      output_file = join(current_path, 'tmp_output', name) 

      # blast command example (export out file to a given path) 
      command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file) 

      # execute command with python subprocess 
      cmd = Popen(command, stdin=PIPE, stdout=PIPE, shell=True) 

      # retrieve the standard output of command 
      cmd_out, cmd_err = cmd.communicate() 

      print '%s\t%s' % (name, output_file)

的命令来调用鼓风是：

command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file)

通常情况下，输出文件是在output_file的路径，但我可以没有在本地文件系统和hdfs上找到它们。看起来它们是在临时目录中创建的，并在执行后消失。我如何检索它们？

来源

2016-03-02 user2583253

我找到了blast的输出文件。看起来，他们留在爆炸执行的节点。所以在我把它们放回hdfs后，我可以在目录/user/yarn下访问它们。我所做的是下面的代码添加到map.py：

command = 'hadoop fs -put %s' % output_file 
cmd = Popen(command, stdin=PIPE, stdout=PIPE, shell=True)

而且我也使用

output_file = join(current_path, 'tmp_output', name)

[更新修改的输出路径

output_file = name

，而不是在3/3 ] 但是将文件放在用户的纱线目录下并不好，因为普通用户没有权限编辑目录下的文件。我建议把文件放入/tmp/blast_tmp通过改变命令

command = 'hadoop fs -put %s /tmp/blast_tmp' % output_file

在此之前，该目录/tmp/blast_tmp应

% hadoop fs -mkdir /tmp/blast_tmp

创建和

% hadoop fs -chmod 777 /tmp/blast_tmp

在改变目录的权限这种情况下，用户纱线和你都可以访问目录。

来源

2016-03-02 09:48:12 user2583253

hadoop streaming中python子进程的输出文件在哪里

回答

相关问题