2013-01-15 82 views
-1

我确定可以有更简单的方法在这两个名字中找到不同的名字。我有一个脚本,如果有两个不同的样本非常接近匹配,就会在表中匹配百分比。Perl找到两个名字的区别

但我想微调它们 - 只在样本完全不同时给百分比上色。

例如:如果SD0098a与SD0098 [b-z]匹配不需要使用不同的颜色 ,但SD0098a与SD0097 [a-z]匹配应给予警报颜色。

我有一个代码是这样的:

my @input = get_data(); 
    @input = @input[28..37] if $ARGV[0] eq 'titchy'; 
    @input = @input[0..49] if $ARGV[0] eq 'small'; 
    my $t0 = time; 
    my %map; 
    my $tv = 'A'; 
    $map{$_}=$tv++ foreach qw(N NN A C G T AC AG AT CA CG CT GA GC GT TA TC TG); 
    my %inv_map = reverse %map; 
    my $t_store = {}; 
    my $res; 

    my @index; 
    my @columns = ({ 'key' => 'code', 'label' => 'Sample name', 'class'=>q({'enc':'[[r:enc]]'}) }); 

    my %col_info; 
    foreach(@input){ 
     my ($k,@v) = split m{:}mxs; 
     push @index, $k; 

     $t_store->{$k} = { 
     'code' => $k, 
     'flags' => (join q(), map { $_ =~ /[ACGT]/ ? q(`) : q() } @v), 
     "i_$k" => q(-), 
     'enc' => join q(), map { $map{$_}||'@' } @v, 
     }; 

     $col_info{$k} = { 'key' => "i_$k", 'label' => $k, 'rotate' => 1, 
     'header_class' => q({'enc':').$t_store->{$k}{'enc'}.q('}), 
     'format' => [ [ 'r', 'exact', q(-) ], [ 'p0' ] ], 
     'class' => "[[r:c_$k]]", 
     }; 
    } 

    @index = sort @index; 

    push @columns, map { $col_info{$_} } @index; 

    my @t_index = @index; 

    my $N = 0; 
# get the first element of the array sample a. 
    while(my $a = shift @t_index) { 
     foreach my $b (@t_index) { 
     $N++; 
     my $count_flag = $t_store->{$a}{'flags'} & $t_store->{$b}{'flags'}; 
     my $mismatches = ($t_store->{$a}{'enc'}^$t_store->{$b}{'enc'}) | $count_flag; 

     my $n_match = $mismatches =~ tr{`}{`}; 
     my $n_count = $count_flag =~ tr{`}{`}; 
     my $n_total = length $t_store->{$a}{'flags'}; 

     #$t_store->{$b}{"o_$a"} = $t_store->{$a}{"o_$b"} = $n_total ? $n_count/$n_total : 0; 
     #$t_store->{$b}{"m_$a"} = $t_store->{$a}{"m_$b"} = $n_count ? 1-$n_match/$n_count : 0; 
     my $x = $t_store->{$b}{"i_$a"} = $t_store->{$a}{"i_$b"} = $n_count ? $n_match/$n_count : q(-); 
# This is where the check should go : 
     $t_store->{$b}{"c_$a"} = $t_store->{$a}{"c_$b"} = "{'m':$n_match,'i':$n_count,'n':$n_total} gt". 
      ($x eq '-' || $x < 0.7 ? '' : ' id_'.floor($x*20)); 
     } 
    } 

    sub get_data { 
     return qw(
    SD0098a_SD8r9345843_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:NN:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:NN:TC:TA:G:TC:AG:NN:T:GC 
    SD0098b_SD8r9345844_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:NN:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:TC:TA:G:TC:AG:NN:T:GC 
    SD0098c_SD8r9345845_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:C:TA:G:TC:AG:NN:T:NN 
    SD0097a_SD8r9345842_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:C:CT:TC:GA:C:C:T:C:G:C:AG:C:NN:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC 
    SD0097b_SD8r9345841_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:NN:CT:TC:GA:C:C:T:C:G:C:AG:C:GA:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC 
    ); 
    } 

所以根据这个例子,我想给所得到的百分比一些警报颜色如果样品SD0098 [交流]匹配任何SD0097的[AB ]超过90%。感谢您提供任何提示和建议。

输出将是这样的表格格式。它是一个网页输出(只是模型输出不是确切的输出):

Sample SD0098a SD0098b SD0098c SD0097a SD0097b 
SD0098a - 98% 99% 97% 97% 
SD0098b 99% - 95% 97% 99% 
SD0098c 97% 97% - 100% 100% 
SD0097a 97% 97% 100% - 100% 
SD0097b 97% 99% 100% 100% - 
+0

请只发布相关代码,剥离为尽可能短的。你的脚本没有输出 - 你想输出什么和如何 - 到终端,到其他地方的网页? – choroba

+0

这是我可以得到的最短的,我没有包括像JS或CSS这样的其他依赖项。是的,这是网页。输出将是一个表格,其中包含标题和第一列的样本以及每个样本的相关百分比。 – user1958532

回答

0

我觉得String :: Similarity可以帮助你。如果我正确理解你,你需要做大致的字符串匹配

+0

或Text :: LevenshteinXS – reinierpost