2016-05-24 32 views
0

我想用awk由文件的分割一列“(”和计数每个分离命令的第二个变量中的出现的次数。AWK分离列,计数第二分割的发生可变

cluster1(2 genes, 2 taxa): column2 column 3 
cluster1(2 genes, 2 taxa): column2 column 3 
cluster1(3 genes, 2 taxa): column2 column 3 
cluster1(3 genes, 2 taxa): column2 column 3 
cluster1(4 genes, 2 taxa): column2 column 3 

所以我的产出将是

2 genes, 2 taxa = 2 
3 genes, 2 taxa = 2 
4 genes, 2 taxa = 1 

感谢你的帮助, 凯特

+0

所以你尝试了什么? – fedorqui

回答

0
$ awk -F '[()]' '{arr[$2]++} END{for(i in arr) print i " = " arr[i]}' data 
4 genes, 2 taxa = 1 
3 genes, 2 taxa = 2 
2 genes, 2 taxa = 2 

或使用流水线计数uniq

$ grep -oP '(?<=\().*(?=\))' data | uniq -c | awk '{print $2,$3,$4,$5 " =",$1}' 
2 genes, 2 taxa = 2 
3 genes, 2 taxa = 2 
4 genes, 2 taxa = 1 
+0

谢谢你,完美的作品。 –