2009-06-09 37 views
1

我有一个没有得到填充的列表散列。为什么perl“列表散列”这样做?

我检查了添加到散列的结尾处的块实际上是在输入时调用的。如果密钥不存在,它应该添加一个单例列表,否则,如果有的话,推到列表的后面(在右键下引用)。

我明白GOTO是丑陋的,但我已经评论过它,但它没有效果。

问题是,当调用printhits时,没有打印任何东西,就好像散列中没有值。我也试过每个(%基因组),没有骰子。

谢谢!

#!/usr/bin/perl 
use strict; 
use warnings; 

my $len = 11; # resolution of the peaks 

#$ARGV[0] is input file 
#$ARGV[1] is call number 
# optional -s = spread number from call 
# optional -o specify output file name 
my $usage = "see arguments"; 
my $input = shift @ARGV or die $usage; 
my $call = shift @ARGV or die $usage; 
my $therest = join(" ",@ARGV) . " "; 
print "the rest".$therest."\n"; 
my $spread = 1; 
my $output = $input . ".out"; 
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;} 
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;} 

# initialize master hash 
my %genomehits =(); 

foreach (split ';', $input) { 
    my $mygenename = "err_naming"; 
    if ($_ =~ /^(.+)-/) {$mygenename = $1;} 

    open (INPUT, $_); 
    my @wiggle = <INPUT>; 

    &singlegene(\%genomehits, \@wiggle, $mygenename); 

    close (INPUT); 
} 

&printhits; 

#print %genomehits; 
sub printhits { 
    foreach my $key (%genomehits) { 
     print "key: $key , values: "; 
    foreach (@{$genomehits{$key}}) { 
     print $_ . ";"; 
    } 
    print "\n"; 
    } 
} 

sub singlegene { 
# let %hash be the mapping hash 
# let @mygene be the gene to currently process 
# let $mygenename be the name of the gene to currently process 

    my (%hash) = %{$_[0]}; 
    my (@mygene) = @{$_[1]}; 
    my $mygenename = $_[2]; 

    my $chromosome; 
    my $leftbound = -2; 
    my $rightbound = -2; 

    foreach (@mygene) { 
     #print "Doing line ". $_ . "\n"; 

     if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;} 

     if ($_ =~ "Step") { 
      if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;} 
      if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")}; 
      $leftbound = -2; 
      $rightbound = -2; 
      next; 
     } 

     my @line = split /\t/, $_; 
     my $pos = $line[0]; 
     my $val = $line[-1]; 

     # above threshold for a call 
     if ($val >= $call) { 
      # start of range 
      if ($rightbound != ($pos - 1)) { 
       $leftbound = $pos; 
       $rightbound = $pos; 
      } 
      # middle of range, increment rightbound 
      else { 
       $rightbound = $pos; 
      } 

      if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;} 
     } 
     # else reinitialize: not a call 
     else { 
      FORTHELASTONE: 
      # typical case, in an ocean of OFFs 
      if ($rightbound != ($pos-1)) { 
       $leftbound = $pos; 
      } 
      else { 
      # register the range 
       my $range = $rightbound - $leftbound; 
       for ($spread) { 
        $leftbound -= $len; 
        $rightbound += $len; 
       } 
       #print $range . "\n"; 

       foreach ($leftbound .. $rightbound) { 
        my $key = "$chromosome:$_"; 
        if (not defined $hash{$key}) { 
         $hash{$key} = [$mygenename]; 
        } 
        else { push @{$hash{$key}}, $mygenename; } 
       } 
      } 
     } 

    } 

}
+0

我强烈建议您将参数从参数堆栈中移出。从技术上讲,直接修复它们的速度会更快,但阅读起来很难,实际上可能会造成您的困惑。 – 2009-06-10 02:12:54

回答

4

你传递一个参考%genomehits的功能singlegene,然后复制到一个新的哈希值,当你做my (%hash) = %{$_[0]};。然后,将值添加到%hash,该值在该函数的末尾消失。

要解决该问题,请直接使用带箭头符号的引用。例如。

my $hash = $_[0]; 
... 
$hash->{$key} = yadda yadda; 
+0

非常感谢;我以前使用过这样的引用,但是perl告诉我这是不推荐使用的语义。从不听编译器! – Overflown 2009-06-09 15:49:25

2

我认为这是这一行:

my (%hash) = %{$_[0]}; 

你传递一个引用,但这句话是使你的哈希的副本。所有你在单基因中增加的东西在你返回时会丢失。

将它留作散列引用,它应该工作。

PS-Data :: Dumper是大型数据结构不能按预期运行时的朋友。我撒了其中的一些在你的代码...

use Data::Dumper; print Dumper \%genomehash;