2016-03-16 35 views
2

我正在试图找到以下的行,这些行在file2中找不到。下面的awk会运行,但不会产生任何结果。谢谢 :)。awk找到2个文件中的不匹配

file1的

955763 
957852 
976270 

file2的

chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 
chr1 957571 957852 chr1:957571-957852 AGRN-7|gc=61.2 
chr1 970621 970740 chr1:970621-970740 AGRN-8|gc=57.1 

期望的输出

2 ids found 
976270 missing 

awk的(missing.awk)

BEGIN { FS="[[:space:]]+|-" } 
NR == FNR { seen[$0]; next } 
$3 in seen { found[$3]; delete seen[$6] } 
END { print length(found) " ids found" 
    for (i in seen) print i " missing" } 

awk -f missing.awk file1 file2 

回答

2

awk来救援!

如果文件2大小比文件1大得多,这种方式较好

awk 'NR==FNR{a[$1];next} 
    $3 in a{c++; delete a[$3]} 
     END{if(c) print c " ids found"; 
      for(k in a) print k " missing"}' file1 file2 

2 ids found 
976270 missing 
+0

的'awk'运行在大文件很大。谢谢 :)。 – Chris