MissinOutputException在snakemake

我计划我的生物信息学管道进入snakemake因为我目前的管道是多个脚本越来越难追的集合。在教程和文档的基础上，snakemake似乎是流水线管理非常明确和有趣的选择。但是，我不熟悉Python，因为我主要是使用bash和R工作，所以snakemake似乎有点难以学习：我现在面临以下问题。MissinOutputException在snakemake

我有两个文件，sampleA_L001_R1_001.fastq.gz和sampleA_L001_R2_001.fastq.gz，wchich被放置到同一个目录sampleA。我想通过使用cat命令来合并这些文件。这实际上是一个测试运行：在实际情况下，我会为每个样本使用八个单独的FASTQ文件，这些文件应以类似的方式合并。非常简单的工作，但我的代码有问题。

snakemake --latency-wait 20 --snakefile /home/users/me/bin/snakefile.txt 

rule mergeFastq: 
    input: 
     reads1='sampleA/sampleA_L001_R1_001.fastq.gz', 
     reads2='sampleA/sampleA_L001_R2_001.fastq.gz' 
    output: 
     reads1='sampleA/sampleA_R1.fastq.gz', 
     reads2='sampleA/sampleA_R2.fastq.gz' 
    message: 
     'Merging FASTQ files...' 
    shell: 
     'cat {input.reads1} > {output.reads1}' 
     'cat {input.reads2} > {output.reads2}' 

------------------------------------------------------------- 

Provided cores: 1 
Rules claiming more threads will be scaled down. 
Job counts: 
    count jobs 
    1 mergeFastq 
    1 

Job 0: Merging FASTQ files... 

Waiting at most 20 seconds for missing files. 
Error in job mergeFastq while creating output files sampleA_R1.fastq.gz, sampleA_R2.fastq.gz. 
MissingOutputException in line 5 of /home/users/me/bin/snakefile.txt: 
Missing files after 20 seconds: 
sampleA_R1.fastq.gz 
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. 
Removing output files of failed job mergeFastq since they might be corrupted: sampleA_R2.fastq.gz 
Will exit after finishing currently running jobs. 
Exiting because a job execution failed. Look above for error message.

正如你所看到的，我已经尝试过--latency-wait选项没有任何成功。你有什么想法可能是我的问题的根源？文件路径是正确的，文件本身没有损坏，并确定。我也遇到了与通配符类似的问题，所以在snakemake基础知识中一定有一些我不明白的地方。

来源

2017-06-09 Jokhe

在告诉你，国家'input.reads1'代表一个单独的文件，所以你的'cat'命令达简单地使它的一个副本。这是你想用8个文件列表（可能通过使用通配符）替换吗？然后你将不得不添加一个顶部的“全部”规则作为输入“mergeFastq”的输出。（也许这是什么原因导致你@ rioulen的答案的评论报告错误。） – bli

你是正确的 - 我wan't具有8个档集列表替换这些单个文件。我添加了“全部”规则，现在我的脚本正常工作。感谢您的建议！ – Jokhe

的问题是在Shell语句，它被连接成一个命令，它生成一个文件“sampleA/sampleA_R1.fastq.gzcat”，这就是为什么snakemake没有找到正确的输出。例如，您可以使用以下语法：

rule mergeFastq: 
    input: 
     reads1='sampleA/sampleA_L001_R1_001.fastq.gz', 
     reads2='sampleA/sampleA_L001_R2_001.fastq.gz' 
    output: 
     reads1='sampleA/sampleA_R1.fastq.gz', 
     reads2='sampleA/sampleA_R2.fastq.gz' 
    message: 
     'Merging FASTQ files...' 
    shell:""" 
     cat {input.reads1} > {output.reads1} 
     cat {input.reads2} > {output.reads2} 
    """

不需要选项latency-wait。

来源

2017-06-09 15:01:46 rioualen

感谢您的帮助，尤其是对于与shell连接有关的说明。我试过你的语法，但它仍然对我产生错误。语法的输出是：'WorkflowError。目标规则可能不包含通配符。请指定具体文件或没有通配符的规则。 ' – Jokhe

@Jokhe关于新的错误，你应该张贴其他问题，您的更新snakefile。 – bli

MissinOutputException在snakemake

回答

相关问题