我确定可以有更简单的方法在这两个名字中找到不同的名字。我有一个脚本,如果有两个不同的样本非常接近匹配,就会在表中匹配百分比。Perl找到两个名字的区别
但我想微调它们 - 只在样本完全不同时给百分比上色。
例如:如果SD0098a与SD0098 [b-z]匹配不需要使用不同的颜色 ,但SD0098a与SD0097 [a-z]匹配应给予警报颜色。
我有一个代码是这样的:
my @input = get_data();
@input = @input[28..37] if $ARGV[0] eq 'titchy';
@input = @input[0..49] if $ARGV[0] eq 'small';
my $t0 = time;
my %map;
my $tv = 'A';
$map{$_}=$tv++ foreach qw(N NN A C G T AC AG AT CA CG CT GA GC GT TA TC TG);
my %inv_map = reverse %map;
my $t_store = {};
my $res;
my @index;
my @columns = ({ 'key' => 'code', 'label' => 'Sample name', 'class'=>q({'enc':'[[r:enc]]'}) });
my %col_info;
foreach(@input){
my ($k,@v) = split m{:}mxs;
push @index, $k;
$t_store->{$k} = {
'code' => $k,
'flags' => (join q(), map { $_ =~ /[ACGT]/ ? q(`) : q() } @v),
"i_$k" => q(-),
'enc' => join q(), map { $map{$_}||'@' } @v,
};
$col_info{$k} = { 'key' => "i_$k", 'label' => $k, 'rotate' => 1,
'header_class' => q({'enc':').$t_store->{$k}{'enc'}.q('}),
'format' => [ [ 'r', 'exact', q(-) ], [ 'p0' ] ],
'class' => "[[r:c_$k]]",
};
}
@index = sort @index;
push @columns, map { $col_info{$_} } @index;
my @t_index = @index;
my $N = 0;
# get the first element of the array sample a.
while(my $a = shift @t_index) {
foreach my $b (@t_index) {
$N++;
my $count_flag = $t_store->{$a}{'flags'} & $t_store->{$b}{'flags'};
my $mismatches = ($t_store->{$a}{'enc'}^$t_store->{$b}{'enc'}) | $count_flag;
my $n_match = $mismatches =~ tr{`}{`};
my $n_count = $count_flag =~ tr{`}{`};
my $n_total = length $t_store->{$a}{'flags'};
#$t_store->{$b}{"o_$a"} = $t_store->{$a}{"o_$b"} = $n_total ? $n_count/$n_total : 0;
#$t_store->{$b}{"m_$a"} = $t_store->{$a}{"m_$b"} = $n_count ? 1-$n_match/$n_count : 0;
my $x = $t_store->{$b}{"i_$a"} = $t_store->{$a}{"i_$b"} = $n_count ? $n_match/$n_count : q(-);
# This is where the check should go :
$t_store->{$b}{"c_$a"} = $t_store->{$a}{"c_$b"} = "{'m':$n_match,'i':$n_count,'n':$n_total} gt".
($x eq '-' || $x < 0.7 ? '' : ' id_'.floor($x*20));
}
}
sub get_data {
return qw(
SD0098a_SD8r9345843_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:NN:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:NN:TC:TA:G:TC:AG:NN:T:GC
SD0098b_SD8r9345844_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:NN:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:TC:TA:G:TC:AG:NN:T:GC
SD0098c_SD8r9345845_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:C:TA:G:TC:AG:NN:T:NN
SD0097a_SD8r9345842_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:C:CT:TC:GA:C:C:T:C:G:C:AG:C:NN:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
SD0097b_SD8r9345841_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:NN:CT:TC:GA:C:C:T:C:G:C:AG:C:GA:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
);
}
所以根据这个例子,我想给所得到的百分比一些警报颜色如果样品SD0098 [交流]匹配任何SD0097的[AB ]超过90%。感谢您提供任何提示和建议。
输出将是这样的表格格式。它是一个网页输出(只是模型输出不是确切的输出):
Sample SD0098a SD0098b SD0098c SD0097a SD0097b
SD0098a - 98% 99% 97% 97%
SD0098b 99% - 95% 97% 99%
SD0098c 97% 97% - 100% 100%
SD0097a 97% 97% 100% - 100%
SD0097b 97% 99% 100% 100% -
请只发布相关代码,剥离为尽可能短的。你的脚本没有输出 - 你想输出什么和如何 - 到终端,到其他地方的网页? – choroba
这是我可以得到的最短的,我没有包括像JS或CSS这样的其他依赖项。是的,这是网页。输出将是一个表格,其中包含标题和第一列的样本以及每个样本的相关百分比。 – user1958532