计算perl中每个行位置每个字符的出现次数

与问题 unix - count occurrences of character per line/field 类似，但对于行上每个位置上的每个字符。计算perl中每个行位置每个字符的出现次数

每1E7线鉴于〜的文件500的字符，我想的二维摘要结构等 $摘要{ 'A'， 'B'， 'C'， '0'， '1'，” 2'} [pos 0..499] = count_integer 它显示每个字符在每行中的使用次数。任何一个维度的顺序都很好。

我的第一种方法那样++摘要{炭} [POS]在阅读，但因为许多线是相同的，它快得多计数相同的行第一，然后总结总结{炭} [POS] + = n一次

是否有更多的惯用或更快的方式比以下C型二维循环？

#!perl 
my (%summary, %counthash); # perl 5.8.9 

sub method1 { 
    print "method1\n"; 
    while (<DATA>) { 
     my @c = split(// , $_); 
     ++$summary{ $c[$_] }[$_] foreach (0 .. $#c); 
    } # wend 
} ## end sub method1 

sub method2 { 
    print "method2\n"; 
    ++$counthash{$_} while (<DATA>); # slurpsum the whole file 

    foreach my $str (keys %counthash) { 
     my $n = $counthash{$str}; 
     my @c = split(//, $str); 
     $summary{ $c[$_] }[$_] += $n foreach (0 .. $#c); 
    } #rof my $str 
} ## end sub method2 

# MAINLINE 
if (rand() > 0.5) { &method1 } else { &method2 } 
print "char $_ : @{$summary{$_}} \n" foreach ('a', 'b'); 
# both methods have this output summary 
# char a : 3 3 2 2 3 
# char b : 2 2 3 3 2 
__DATA__ 
aaaaa 
bbbbb 
aabba 
bbbbb 
aaaaa

来源

2015-12-07 jgraber

很难用这些示例数据直观地查看您要查找的内容 - 我认为您的场景不像重复字符的线条那么平凡？另外：'严格使用;使用警告;'是一个非常好的主意。 – Sobrique

我看到的唯一的低效率/非惯用性（？）是，您还要计算所有行终止字符（换行符和/或CR）。（除非你有所作为，否则Perl将它们包含在'$ _'中。）在读取每个''后，粘贴一个'chomp;'。 –

@JeffY：* unidiomaticity *，我相信 – Borodin

根据网站资料的方法2形成可能是有点快于或慢的方法1.

但一个很大的区别是使用解压，而不是分裂。

use strict; 
use warnings; 
my (%summary, %counthash); # perl 5.8.9 

sub method1 { 
    print "method1\n"; 
    my @l= <DATA>; 
    for my $t(1..1000000) { 
     foreach (@l) { 
      my @c = split(// , $_); 
      ++$summary{ $c[$_] }[$_] foreach (0 .. $#c); 
     }  
    } # wend 
} ## end sub method1 

sub method2 { 
    print "method2\n"; 
    ++$counthash{$_} while (<DATA>); # slurpsum the whole file 
    for my $t(1..1000000) { 
     foreach my $str (keys %counthash) { 
      my $n = $counthash{$str}; 
      my $i = 0; 
      $summary{ $_ }[$i++] += $n foreach (unpack("c*",$str)); 
     }  
    } 
} ## end sub method2 

# MAINLINE 
#method1(); 
method2(); 
print "char $_ : ". join (" ", @{$summary{ord($_)}}). " \n" 
    foreach ('a', 'b'); 
# both methods have this output summary 
# char a : 3 3 2 2 3 
# char b : 2 2 3 3 2 
__DATA__ 
aaaaa 
bbbbb 
aabba 
bbbbb 
aaaaa

运行速度更快。（而不是我的个人电脑上的7.x秒）

来源

2015-12-08 09:26:54

你测试了吗？ {unpack（“c *”，$ str）}会生成98和97的错误摘要键，而不是'a'和'b'; 'a *'不起作用;这个工作：$ summary {$ _} [$ i ++] + = $ n foreach（unpack（'a'x length（$ str），$ str））;这也工作$ summary {chr（$ _）} [$ i ++] + = $ n foreach（unpack（'c *'，$ str））; – jgraber

$ summary {substr（$ str，$ _，1）} [$ _] + = $ n foreach（0 ..（length（$ str）-1））; ＃等于很快 – jgraber

@jgrabber是的，我做了，它的工作。解压缩只是返回字符串的反转，所以在我的代码中，我打印sumary {ord（$ _）}，你可能已经注意到了... 但是..长度和子字符串的解决方案更快。原始代码（执行一百万次）在我的电脑上花费7.177秒，解压缩解包需要5.879秒，长度和子串的解决方案只需要4.286秒。 –

计算perl中每个行位置每个字符的出现次数

回答

相关问题