从模式中找到2个文件中的最长匹配

-1

我在此程序中执行两个文件时遇到问题。我正在尝试访问文件$Q和$s的内容。从模式中找到2个文件中的最长匹配

print "Input the K value \n"; 
$k = <>; 
chomp $k; 

print "Input T\n"; 
$t = <>; 
chomp $t; 

%Qkmer =();      
$i = 1; 

$query=' '; 
while ($line=<IN>) { 
chomp($line); 
if ($line=~ m/^>/) { 
next; 
} 
$query=$query.$line; 
$line=~ s/(^|\n)[\n\s]*/$1/g; 

while (length($line) >= $k) { 
    $line =~ m/(.{$k})/; 
    if (! defined $Qkmer{$1}) {#every key not deined as the first match 
    $Qkmer{$1} = $i; 
    } 
    $i++; 
    $line = substr($line, 1, length($line) -1); 
} 
} 

open(MYDATA, '<', "data.txt"); 

while ($line=<MYDATA>) { \ 
    chomp($line); 
    %Skmer =();   # This initializes the hash called Skmer. 
    $j = 1; 

    if ($line=~ m/^>/) { #if the line starts with > 
    next; #start on next line #separated characters 
    } 
    $line=~ s/^\s+|\s+$//g ; #remove all spaces from file 
    while (length($line) >= $k) { 
    $line =~ m/(.{$k})/;#match any k characters and only k characters in dna 
    $Skmer{$1} = $j; #set the key position to $j and increase for each new key 
    $j++; 
    $line = substr($line, 1, length($line) -1); #this removes the first character in the current string 
    } 

    ###(56)###for($Skmerkey(keys %Skmer)){ 
    $i=$Skmer{$Skmerkey}; 
    if(defined $Qkmer($Skmerkey)){ 
     $j=$Qkmer($Skmerkey); 
     } 
     $S1=$line; 
     $S2=$query; 
     @arrayS1= split(//, $S1); 
     @array2= split(//, $S2); 

     $l=0; 
     while($arrayS1[$i-$l] eq $arrayS2[$j-$l]){ 
     $l++; 
     } 
     $start=$i-$l; 
     $m=0; 
     while ($arrayS1[$i+$k+$m] eq $arrayS2[$j+$k+$m]) { 
     $m++; 
     } 
     $length=$l+$k+$m; 
     $match= substr($S1, $start, $length); 

     if($length>$t){ 
     $longest=length($match); 
     print "Longest: $match of length $longest \n"; 
     } 
    } 

}###(83)###

输入文件只包含字母串。例如：

文件1：

ahhtsagnchjgstffhjyfcsghnvzfhg

文件2：

ggujvfbgfgkjfcijjjffcvvafcsghnvzfhgvugxckugcbhfcgh 
ghnvzfhgvugxckHhfgjgcfujvftjbvdtkhvddgjcdgjxdjkfrh 
ajdbvciyqdanvkjghnvzfhgvugxc

从匹配文件2长度$k在文件1的单词的，我从检查在文件2中匹配词的左侧和右侧以进一步匹配。最终输出是基于$k的文件1和文件2之间最长的匹配。现在我GE

有了这个代码，我得到一个语法错误，为什么，因为它看起来是正确的给我我不发售者：

syntax error at testk.pl line 56, near "$Skmerkey(" 
syntax error at testk.pl line 83, near "}"

谢谢。

来源

2016-11-08 Alina Orozco

已使用'$ k'，但未定义。 '用警告;严格使用;' – Mike

哈希'％kmer' allways空!!! – Mike

是的，我只是把代码从我的编辑器直接放在这里，所以我不得不手动添加空格，以便将所有内容都整合到一个代码块中。在我的实际程序中，注释掉的区域未被注释掉。我用哈希来区分我特别遇到问题并可能需要以另一种方式实现的区域。 –

use strict;   # <--- Allways use this 
use warnings;  # <--- and this 
use Data::Dumper; 

my $k=3; 

open(my $IN, '<', "File2"); # use $IN instead of depricated IN 
my $line=0; # line number 
my %kmer; # hash of arrays of all $k-letter "words" line/position 
my @Q;  # rows of Q-file 
while(<$IN>) { 
    chomp; 
    next if /^>/; 
    s/^\s+|\s+$//g; 
    next if !$_; 
    my $pos=0; 
    push @Q, $_; # store source row 
    for(/(?=(.{$k}))/g) { # Capture $k letters. floating window with step 1 symbol 
    push @{$kmer{$_}}, [$line,$pos]; # store row number and position of "word" 
    $pos++; 
    } 
    $line++; 
} 

open($IN, '<', "File1"); 
$line=0; 
while(<$IN>) { # Read S-file 
    chomp; 
    next if /^>/; 
    s/^\s+|\s+$//g; 
    next if !$_; 
    my $pos=0; 
    my $len=length($_); # length of row of S-file 
    my $s=$_;   # Current row of S-file 
    my @ignore=();  # array for store information about match tails 
    for(/(?=(.{$k}))/g) { 
    next if ! $kmer{$_}; # "word" not found try to next 
    for(@{$kmer{$_}}) { # $kmer{word} contains array of lines/positions in Q 
     my($qline, $qpos)[email protected]{$_}; 
#  print "test $qline:$qpos "; 
     if(grep {$_->[0]==$qline && $_->[1]==$qpos } @ignore) { 
     # this line/position already tested and included in found matching 
#  print "Ignore match tail $qline:$qpos\n"; 
     next; 
     } 
     my $j=$k; # $k letters same, test after this point 
     my $qlen=length($Q[$qline]); 
     $j++ while($pos+$j<$len && $qpos+$j<$qlen && 
        substr($s,$pos+$j,1) eq substr($Q[$qline],$qpos+$j,1)); 
     print "MATCH FOUND: S-file line $line pos $pos, Q-file line $qline pos $qpos: ", 
      substr($s,$pos,$j),"\n"; 
     push @ignore, [$qline, $qpos, $j]; # store positions and length of match 
    } 
    } continue { # Continue block works on all loops, include after "next" 
    $pos++; 
    @ignore=grep { # recalculate/filter position and length of all match tails 
        ++$_->[1]; # increment position 
        (--$_->[2]) # decrement length 
        >= $k  # and filter out lengths < $k 
       } @ignore; 
# print Dumper(\@ignore); 
    } 
    $line++; 
}

来源

2016-11-08 20:52:31 Mike

从模式中找到2个文件中的最长匹配

回答

相关问题