2013-04-17 26 views
0

我正在处理两个大数据集(300 x 500,000),并且我在两个数据中都有一个具有0,1,2和NA值的矩阵,并且我想比较这些文件并计数每个行在两个文件中匹配的数量,并将结果插入到输出表结果中。比较并在perl中插入表中列元素的计数数

File 1 

2 1 0 
0 1 1 
1 0 NA 

File 2 

2 1 0 
Na 1 1 
1 NA 0 

如何比较每一行中匹配值的数量和总数?

回答

0

我理解你所说的“总”和匹配的行数意思是刚刚倾倒,但这样做你问什么,你应该能够把它采用到您的确切规格

#!/usr/bin/perl 
# 
use Data::Dumper; 
use strict; 
use warnings; 
# open files with error checking 
open(my $f1,"file1") || die "$! file1"; 
open(my $f2,"file2") || die "$! file2"; 

#hash to store count of similar rows in 
my %match_count=(); 
#total sum 
my $total=0; 

#read line from each file, lower case it to ignore Na NA difference and 
#chomp to remove \n so this isn't stored 
while(my $l1=lc(<$f1>)) { 
    my $l2 = lc(<$f2>); 
    chomp($l1); 
    chomp($l2); 
    #see if lines are the same 
    if ($l1 eq $l2) { 
     #increment counter for this line 
     $match_count{$l1}++; 
     #find sum of row and add to total 
     my ($first,$second,$third) = split(/\s/,$l1); 
     $total += $first+$second+$third; 
    } 
} 

print "sum total of matches = $total\n"; 
print Dumper(\%match_count); 
+0

的在两个文件中匹配的总数为一个表,其中两个文件中的元素总数为0,1,2。NA我不想检查。 – user2288980

+0

@ user2288980如果你正在寻找更多的帮助,那么“NA我不想检查”不会增加任何要求 – Vorsprung