2012-08-02 49 views
1

我是一个完整的业余perl,我想问一个问题来帮助我找到和替换函数,我试图应用于更改SAM文件上的参考名称,以便我可以通过FindPeaks运行它。这些文件对于我来说能够在文本编辑器中打开它们并且无需使用编程语言来运行匹配,这些文件是很大的(从5到17个演出)。如何让Perl用几个|来识别一个字符串在它作为一个单一的字符串?

基本上,我希望Perl匹配整个字符串,如“gi | 149288852 | ref | NC_000067.5 | NC_000067”,并用“chr1”替换整个事物。

然而,到目前为止,我只能似乎使它要么代之以 “CHR 1 | CHR1 | CHR1 | CHR1 | CHR 1” 或 “GI | CHR1 |裁判| NC000067.g | NC_000067”

人帮帮我?

编辑:

我已经尝试了一些不同的东西,但我想要做的是修改程序我的上司从别人得到正确做到这一点,我会发布它下面:

#!/usr/bin/perl 

use strict; 
use warnings; 

my %Chr = (

    "gi|149288852|ref|NC_000067.5|NC_000067" => "chr1", 
    "gi|149288869|ref|NC_000076.5|NC_000076" => "chr10", 
    "gi|149288871|ref|NC_000077.5|NC_000077" => "chr11", 
    "gi|149292731|ref|NC_000078.5|NC_000078" => "chr12", 
    "gi|149292733|ref|NC_000079.5|NC_000079" => "chr13", 
    "gi|149292735|ref|NC_000080.5|NC_000080" => "chr14", 
    "gi|149301884|ref|NC_000081.5|NC_000081" => "chr15", 
    "gi|149304713|ref|NC_000082.5|NC_000082" => "chr16", 
    "gi|149313536|ref|NC_000083.5|NC_000083" => "chr17", 
    "gi|149321426|ref|NC_000084.5|NC_000084" => "chr18", 
    "gi|149323268|ref|NC_000085.5|NC_000085" => "chr19", 
    "gi|149338249|ref|NC_000068.6|NC_000068" => "chr2", 
    "gi|149352351|ref|NC_000069.5|NC_000069" => "chr3", 
    "gi|149354223|ref|NC_000070.5|NC_000070" => "chr4", 
    "gi|149354224|ref|NC_000071.5|NC_000071" => "chr5", 
    "gi|149361431|ref|NC_000072.5|NC_000072" => "chr6", 
    "gi|149361432|ref|NC_000073.5|NC_000073" => "chr7", 
    "gi|149361523|ref|NC_000074.5|NC_000074" => "chr8", 
    "gi|149361524|ref|NC_000075.5|NC_000075" => "chr9", 
    "gi|149361525|ref|NC_000086.6|NC_000086" => "chrX", 
    "gi|149361526|ref|NC_000087.6|NC_000087" => "chrY", 
    ); 

my $usage = "\n\n\tUsage: convert.pl <SAM file>\n\nThis script converts NCBI ref#s to chr #s\n\n"; 

die $usage unless (@ARGV == 1); 

my $file = $ARGV[0]; 

open (IN, "$file") or die "Can't open file: $file\n"; 

while (<IN>){ 

    if (/\S+\s+\d+\s+(gi\S+)/){ 

    my $tag = $1; 
    if (exists $Chr{$tag}){ 
     my $line = $_; 
     $line =~ s/'$tag'/$Chr{$tag}/; 
     print $line; 
    } 
    else { 
     die "\n\n\nHash value doesn't exist for $tag $_\n\n"; 
    } 
    } 
    else { 

    print $_; 
    } 
} 

它与出来: “GI | CHR1 |裁判| NC000067.g | NC_000067”

我也试过这样:

perl -pi -w -e 's/gi|149288852|ref|NC_000067.5|NC_000067/chr1/g;' *.sam 

,看看我能做到这一点一个接一个,但出来的 “CHR 1 | CH1 | CHR1 | CHR1 | CHR1”

+3

听起来像一个贪婪/非贪婪匹配问题? – John 2012-08-02 19:45:10

+0

你可以包含你正在使用的正则表达式吗? – Dancrumb 2012-08-02 19:45:38

+0

胖逗号自动引用左值,所以你不需要''gi | 149361526 | ref | NC_000087.6 | NC_000087“=>”chrY“',你应该只有'gi | 149361526 | ref | NC_000087.6 | NC_000087 =>“chrY”' – 2012-08-02 19:56:40

回答

4

的一个问题是:

$line =~ s/'$tag'/$Chr{$tag}/; 

$tag仍包含元字符。

用途:

$line =~ s/\Q$tag/$Chr{$tag}/; 
+0

谢谢!它现在完美运行! – 2012-08-02 20:47:35

相关问题