在Perl中有多个输出文件的多个目录中运行脚本（比较散列键值的问题）

-1

我有一个脚本，看起来像这样，我想用它来搜索当前目录，我在，打开，所有目录中，打开与某些RE匹配的所有文件（fastq文件，格式为每四行一行），对这些文件进行一些处理，并将一些结果写入每个目录中的文件。（注意：实际的脚本比这个做得更多，但我认为我有一个与文件夹迭代相关的结构问题，因为当在一个文件夹中使用简化版本时脚本可以工作，所以我在这里发布简化版本）在Perl中有多个输出文件的多个目录中运行脚本（比较散列键值的问题）

#!user/local/perl 
#Created by C. Pells, M. R. Snyder, and N. T. Marshall 2017 

#Script trims and merges high throughput sequencing reads from fastq files for a specific primer set 

use Cwd; 
use warnings; 

my $StartTime= localtime; 

my $MasterDir = getcwd; #obtains a full path to the current directory 


opendir (DIR, $MasterDir); 
my @objects = readdir (DIR); 
closedir (DIR); 
foreach (@objects){ 
    print $_,"\n"; 
} 

my @Dirs =(); 
foreach my $O (0..$#objects){ 
    my $CurrDir = ""; 
    if ((length ($objects[$O]) < 7) && ($O>1)){ #Checking if the length of the object name is < 7 characters. All samples are 6 or less. removing the first two elements: "." and ".." 
     $CurrDir = $MasterDir."/".$objects[$O]; #appends directory name to full path 
     push (@Dirs, $CurrDir); 
    } 
} 

foreach (@Dirs){ 
    print $_,"\n";#checks that all directories were read in 
} 


foreach my $S (0..$#Dirs){ 
    my @files =(); 
    opendir (DIR, $Dirs[$S]) || die "cannot open $Dirs[$S]: $!"; 
    @files = readdir DIR; #reads in all files in a directory 
    closedir DIR; 
    my @AbsFiles =(); 
    foreach my $F (0..$#files){ 
     my $AbsFileName = $Dirs[$S]."/".$files[$F]; #appends file name to full path 
     push (@AbsFiles, $AbsFileName); 
    } 

    foreach my $AF (0..$#AbsFiles){ 
     if ($AbsFiles[$AF] =~ /_R2_001\.fastq$/m){ #finds reverse fastq file 
      my @readbuffer=(); 
      #read in reverse fastq 
      my %RSeqHash; 
      my $c = 0; 
      print "Reading, reversing, complimenting, and trimming reverse fastq file $AbsFiles[$AF]\n"; 
      open (INPUT1, $AbsFiles[$AF]) || die "Can't open file: $!\n"; 
      while (<INPUT1>){ 
       chomp ($_); 
       push(@readbuffer, $_); 
       if (@readbuffer == 4) { 
        $rsn = substr($readbuffer[0], 0, 45); #trims reverse seq name 
        $cc++ % 10000 == 0 and print "$rsn\n"; 
        $RSeqHash{$rsn} = $readbuffer[1]; 
       @readbuffer =(); 
       } 
      } 
     } 
    } 
    foreach my $AFx (0..$#AbsFiles){ 
     if ($AbsFiles[$AFx] =~ /_R1_001\.fastq$/m){ #finds forward fastq file 
      print "Reading forward fastq file $AbsFiles[$AFx]\n"; 
      open (INPUT2, $AbsFiles[$AFx]) || die "Can't open file: $!\n"; 
      my $OutMergeName = $Dirs[$S]."/"."Merged.fasta"; 
      open (OUT, ">", "$OutMergeName"); 
      my $cc=0; 
      my @readbuffer =(); 
      while (<INPUT2>){ 
       chomp ($_); 
       push(@readbuffer, $_); 
       if (@readbuffer == 4) { 
        my $fsn = substr($readbuffer[0], 0, 45); #trims forward seq name 
        #$cc++ % 10000 == 0 and print "$fsn\n$readbuffer[1]\n"; 
        if (exists($RSeqHash{$fsn})){ #checks to see if forward seq name is present in reverse seq hash 
         print "$fsn was found in Reverse Seq Hash\n"; 
         print OUT "$fsn\n$readbuffer[1]\n"; 
        } 
        else { 
         $cc++ % 10000 == 0 and print "$fsn not found in Reverse Seq Hash\n"; 
        } 
       @readbuffer =(); 
       } 
      } 
      close INPUT1; 
      close INPUT2; 
      close OUT; 
     } 
    } 
} 
my $EndTime= localtime; 
print "Script began at\t$StartTime\nCompleted at\t$EndTime\n";

再次，我知道脚本作品，未经遍历文件夹。但是对于这个版本，我只是得到空的输出文件。由于我在此脚本中插入了打印函数，因此我确定Perl无法在INPUT2的散列中找到变量$ fsn作为关键字。我不明白为什么，因为每个文件都在那里，它不工作时，我不遍历文件夹，所以我知道密钥匹配。所以无论是简单的我缺少的东西，还是对我发现的Perl内存的某种限制。任何帮助表示赞赏！

来源

2017-08-07 Matthew Snyder

'push my @AbsDirs，...;'因为'my @ AbsDirs'创建了一个新变量，所以没有任何意义。它应该简单地'push @AbsDirs，...;' – ikegami

'$ AbsDirs [$ a]。$ files [$ b]'应该是'“$ AbsDirs [$ a]/$ files [$ b]”' – ikegami

提示：不要使用全局变量。将'open INPUT1，...'替换为'打开我的$ INPUT1，...' – ikegami

原来我的问题是在我声明散列的地方。出于某种原因，即使我只在找到第一个输入文件后才声明它。除非我在循环遍历@AbsFiles中的所有项搜索第一个输入文件的foreach循环之前声明散列，否则脚本将失败，这很好，因为这意味着每个新目录中的散列都会被清除。但我不明白为什么它失败了，因为它应该只在发现输入文件名时声明（或清除）散列。我想我不需要知道为什么它以前不工作，但有些帮助理解会很好。

我必须赞扬另一位用户帮助我认识到这一点。他们试图回答我的问题，但没有回答，然后给了我一个关于我在那个回答的评论中宣布我的散列的地方的暗示。这个答案现在已经被删除了，所以我不能指望这个用户指向我这个方向。我很想知道他们对Perl有什么了解，但我并没有向他们清楚这是问题所在。我很抱歉我忙于数据分析和会议，因此我无法尽快回复该评论。

来源

2017-08-28 23:09:41

在Perl中有多个输出文件的多个目录中运行脚本（比较散列键值的问题）

回答

相关问题