调用与蟒蛇

可变的输入和输出文件作为参数perl脚本我有可以从控制台如下执行的perl脚本：调用与蟒蛇

perl perlscript.pl -i input.txt -o output.txt --append

我想从我的Python代码执行这个脚本。我发现subprocess.Popen可以用来连接到perl，我可以传递我的论点。但是，我也想传递一个变量（通过分割一个文本文件）来代替input.txt。我曾尝试以下，但它似乎没有工作，在第8行给出了一个明显的类型错误：

import re, shlex, subprocess, StringIO 
f=open('fulltext.txt','rb') 
text= f.read() 
l = re.split('\n\n',str(text)) 
intxt = StringIO.StringIO() 
for i in range(len(l)): 
    intxt.write(l[i]) 
    command_line='perl cnv_ltrfinder2gff.pl -i '+intxt+' -o output.gff --append' 
    args=shlex.split(command_line) 
    p = subprocess.Popen(args)

是否有其他变通方法吗？

编辑：这是文件fulltext.txt的一个示例。条目由一行分隔。

Predict protein Domains 0.021 second 
>Sequence: seq1 Len:13143 [1] seq1 Len:13143 Location : 9 - 13124 Len: 13116 Strand:+ Score : 6 [LTR region similarity:0.959] Status : 11110110000 5'-LTR : 9 - 501 Len: 493 3'-LTR : 12633 - 13124 Len: 492 5'-TG : TG , TG 3'-CA : CA , CA TSR  : NOT FOUND Sharpness: 1,1 Strand + : PBS : [14/20] 524 - 543 (LysTTT) PPT : [12/15] 12553 - 12567 

Predict protein Domains 0.019 second 
>Sequence: seq5 Len:11539 [1] seq5 Len:11539 Location : 7 - 11535 Len: 11529 Strand:+ Score : 6 [LTR region similarity:0.984] Status : 11110110000 5'-LTR : 7 - 506 Len: 500 3'-LTR : 11036 - 11535 Len: 500 5'-TG : TG , TG 3'-CA : CA , CA TSR  : NOT FOUND Sharpness: 1,1 Strand + : PBS : [15/22] 515 - 536 (LysTTT) PPT : [11/15] 11020 - 11034

我想分开它们并将每个入口块传递给perl脚本。所有文件都在同一个目录中。

来源

2015-04-23 Rimjhim Roy Choudhury

Perl脚本可以从stdin而不是文件读取输入吗？ – choroba

不，它不能。我正在使用的脚本是：[链接]（https://github.com/jestill/dawgpaws/blob/fb0a40506be1ed8afce0049b6cfe3e4b52cd58dc/scripts/cnv_ltrfinder2gff.pl） –

@RimjhimRoyChoudhury：代码说它可以接受来自标准输入（它甚至提醒关于它：[*“期待从STDIN输入”*）（https://github.com/jestill/dawgpaws/blob/fb0a40506be1ed8afce0049b6cfe3e4b52cd58dc/scripts/cnv_ltrfinder2gff.pl#L294））。尝试省略'infile'选项或在bash中传递一个空的''''文件名或''-''或'/ dev/stdin'。 – jfs

你可能会感兴趣的os module 和string formatting

编辑

我想我uderstand你现在想要什么。纠正我，如果我错了，但我认为：

你想将你的fulltext.txt分成块。
每个块包含SEQ（数字）
你想一次与作为输入文件的序列（号）

如果每块运行perl脚本是你想要的，你可以使用下面的代码。

import os 

in_file = 'fulltext.txt' 
seq = [] 

with open(in_file,'r') as handle: 
    lines = handle.readlines() 
    for i in range(0,len(lines)): 
     if lines[i].startswith(">"): 
      seq.append(lines[i].rstrip().split(" ")[1]) 

for x in seq: 
    command = "perl perl cnv_ltrfinder2gff.pl -i %s.txt -o output.txt --append"%x 
    os.system(command)

来源

2015-04-23 14:00:08 zazga

它给我一个错误：'sh：1：无法打开StringIO.StringIO：没有这样的文件' –

是否位于同一目录中的所有文件？ fulltext.txt中的内容是什么，你想用它做什么？ – zazga

是的，所有文件都位于同一目录中。我为脚本提供了一个示例输入。 –

The docs for --infile option：

Path of the input file. If an input file is not provided, the program will expect input from STDIN.

你可以省略--infile，并通过管道（标准输入），而不是通过输入：

#!/usr/bin/env python 
from subprocess import Popen, PIPE 

with open('fulltext.txt') as file: # read input data 
    blocks = file.read().split('\n\n') 

# run a separate perl process for each block 
args = 'perl cnv_ltrfinder2gff.pl -o output.gff --append'.split() 
for block in blocks: 
    p = Popen(args, stdin=PIPE, universal_newlines=True) 
    p.communicate(block) 
    if p.returncode != 0: 
     print('non-zero exit status: %s on block: %r' % (p.returncode, block))

您可以同时运行多个perl脚本：

from multiprocessing.dummy import Pool # use threads 

def run((i, block)): 
    filename = 'out%03d.gff' % i 
    args = ['perl', 'cnv_ltrfinder2gff.pl', '-o', filename] 
    p = Popen(args, stdin=PIPE, universal_newlines=True, close_fds=True) 
    p.communicate(block) 
    return p.returncode, filename 

exit_statuses, filenames = zip(*Pool().map(run, enumerate(blocks, start=1)))

它并行运行多个（等于系统中的CPU数）子进程。您可以指定不同数量的工作线程（传递给Pool()）。

来源

2015-04-26 00:19:10 jfs

回答

相关问题