2016-07-05 136 views
0

脚本正在做的事情的细节并不重要,但我已将注释放在似乎对我来说很重要的部分,我只关心为什么我收到空行中我的输出为什么在我的Perl脚本中输出空白行打印输出

当我运行命令

./script.pl temp temp.txt tempF `wc -l temp | awk '{print $1}'` 

临时文件包含

1 27800000 120700000 4 
1 27800000 124300000 4 
1 154800000 247249719 3 
0000 71800000 9 
0000 87200000 2 
3 54400000 74200000 15 
4 76500000 155100000 20 
4 76500000 182600000 3 
4 76500000 88200000 77 
4 88200000 124000000 2 
5 58900000 180857866 8 
5 58900000 76400000 2 
5 58900000 97300000 4 
5 76400000 143100000 14 
5 97300000 147200000 6 
6 7000000 29900000 2 
6 63500000 70000000 73 
6 63500000 92100000 4 
6 70000000 113900000 70 
6 70000000 139100000 57 
6 92100000 113900000 3 

一ND我正在形式

hs1 27800000 124300000 4 


hs0000 87200000 2 
hs3 54400000 74200000 15 

hs4 76500000 182600000 3 
hs4 76500000 88200000 77 
hs4 88200000 124000000 2 

hs5 58900000 76400000 2 
hs5 58900000 97300000 4 
hs5 76400000 143100000 14 
hs5 97300000 147200000 6 


hs6 63500000 92100000 4 

hs6 70000000 139100000 57 
hs6 92100000 113900000 3 

到标准输出的输出(关于线8也印刷到TEMP.TXT文件,但比不上那些的格式是正确的)

这是脚本低于

#!/usr/bin/perl 

# ARGV[0] is the name of the file which data will be read from(may have overlaps) 
# ARGV[1] is the name of the file which will be produced that will have no overlaps 
# ARGV[2] is the name of the folder which will hold all the data 
# ARGV[3] is the number of lines that ARGV[0] will contain 

use warnings; 

my $file = "./$ARGV[0]"; 
my @lines = do { 
    open my $fh, '<', $file or die "Can't open $file -- $!"; 
    <$fh>; 
}; 

my $file2 = "./$ARGV[2]/$ARGV[1]"; 
open(my $files, ">", "$file2") or die "Can't open > $file2: $!"; 

my $i = 0; 
while ($i < $ARGV[3] - 1) { 

    my @ref_fields = split('\s+', $lines[$i]); 

    print $files 
     "$ref_fields[0]", "\t", 
     $ref_fields[1], "\t", 
     $ref_fields[2], "\t", 
     $ref_fields[3], "\n"; 

    for my $j ($i + 1 .. $ARGV[3] - 1) { 

     $i = $j; 

     # @curr_fields is initialized here 

     my @curr_fields = split /\s+/, $lines[$j]; 

     if ($ref_fields[0] eq $curr_fields[0] && $ref_fields[2] > $curr_fields[1]) { 

      if (defined($curr_fields[0]) && $curr_fields[0] !~ /\s+/) { 

       chomp $curr_fields[3]; 

       # the line below is the one that is printing to standard output 
       print 
        $curr_fields[0], "\t", 
        $curr_fields[1], "\t", 
        $curr_fields[2], "\t", 
        $curr_fields[3], "\n"; 
      } 
     } 
     else { 
      last; 
     } 
    } 

    print "\n"; 
} 

编辑:运行脚本时从答案贴 当我运行命令

我注意到一个错误

./script.pl temp1 temp10.txt folder 

凡temp1中包含

12 58100000 96200000 0.04348 
3 74200000 87200000 0.04348 
5 130600000 168500000 0.04348 
6 61000000 114600000 0.04348 
6 75900000 114600000 0.04348 
6 88000000 114600000 0.04348 
6 88000000 139000000 0.04348 
6 93100000 161000000 0.04348 
6 105500000 139000000 0.04348 
6 130300000 139000000 0.04348 
7 59900000 77500000 0.04348 
7 98000000 132600000 0.04348 
X 67800000 76000000 0.08696 
Y 28800000 59373566 0.04348 

我得到

6 75900000 114600000 0.04348 
6 88000000 114600000 0.04348 
6 88000000 139000000 0.04348 
6 93100000 161000000 0.04348 
6 105500000 139000000 0.04348 

而且temp10.txt包含

12 58100000 96200000 0.04348 
3 74200000 87200000 0.04348 
5 130600000 168500000 0.04348 
6 61000000 114600000 0.04348 
6 130300000 139000000 0.04348 
7 59900000 77500000 0.04348 
7 98000000 132600000 0.04348 
X 67800000 76000000 0.08696 

线

Y 28800000 59373566 0.04348 

既不在输出或temp10.txt。这似乎已经dissappeared但应该打印到之一这些

回答

2

,因为你在你的代码有行

print "\n"; 

它似乎很明显的是,空行打印

我不禁更因为你说“什么样的剧本正在做的是并不重要的细节”,以及离我们好隐瞒什么它是意味着是做

但是,只要第一列与前一行中的第一列匹配,并且第二个字段小于前一行中的第三个字段,则已写入的内容会打印来自输入文件的行。任何时候,你不这样有资格要打印一个空行



你可能更喜欢你的代码,其行为相同的这个重构的线,但我认为是更可读。它还具有将输入文件中的每一行分割一次的优点,并且不需要第四个参数,因为行数仅仅是@lines数组的大小。空行会被从文件中删除,因为他们被读取,所以不再需要你的支票上的第一个字段的definedness

#!/usr/bin/perl 

# ARGV[0] is the name of the file which data will be read from (may have overlaps) 
# ARGV[1] is the name of the file which will be produced that will have no overlaps 
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.) 

use strict; 
use warnings 'all'; 

use File::Path 'make_path'; 
use File::Spec::Functions 'catfile'; 

my ($file, $newfile, $dir) = @ARGV; 
$newfile = catfile($dir, $newfile); 

my @lines = do { 
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!}; 
    map { [ split ] } grep /\S/, <$fh>; 
}; 

make_path($dir); 
open my $out_fh, '>', $newfile or die qq{Unable to open "$newfile" for output: $!}; 

for (my $i = 0; $i < $#lines;) { 

    my $ref_fields = $lines[$i]; 

    print $out_fh join("\t", @$ref_fields[0..3]), "\n"; 

    for my $j ($i + 1 .. $#lines) { 

     $i = $j; 

     my $curr_fields = $lines[$j]; 

     last unless $curr_fields->[0] == $ref_fields->[0]; 
     last unless $curr_fields->[1] < $ref_fields->[2]; 

     print join("\t", @$curr_fields[0..3]), "\n"; 
    } 
} 
+0

哈哈,没错,就是这样,我是一个傻瓜 – Jacob

+0

如何是一个当我得到一条不符合条件的行时打印一个空行,如果条件不满足,则应执行最后一条语句 – Jacob

+0

@ C.Monster:是的,所以'last'退出'for'循环,之后在'while'结束前有一个'print'\ n“'。看看我的重写建议。它和你自己的代码完全一样,但我可以更容易地阅读它! – Borodin