计算输入文件中字符串的出现次数

此文本文件包含分割成多行的字符串，并且每个字符串都存在多次。

shell脚本需要读取此文本文件并输出每个字符串的字符串和计数。

考虑文本文件是：

添

添

马克

马克

艾伦

艾伦

艾伦

输出应该是这样的：

蒂姆出现2次

马克出现2次

阿伦出现3次

权现在，我能够o打印出现字符串，但重复出现字符串的次数，即“Tim出现2次”会被打印两次。我试图用NULL替换一个字符串，只要我计算它的发生，但由于某种原因，sed不工作，因为也许我没有在正确的位置调用它（或以正确的方式）

#!/bin/bash 

INPUT_FILE="$1" 
declare -a LIST_CHARS 

if [ $# -ne 1 ] 
then 
     echo "Usage: $0 <file_name>" 
     exit 1 
fi 


if [ ! -f $INPUT_FILE ] 
then 
     echo "$INPUT_FILE does not exists. Please specify correct file name" 
     exit 2 
fi 

while read line 
do 
     while read i 
     do 
       echo $line 
       count=`grep -i $line | wc -l` 
       echo "String $line appears $count times" 
     done < $INPUT_FILE 

done < $INPUT_FILE

来源

2012-01-23 Incognito

经典AWK解决方案是这样的：

 
$ awk 'NF{ count[ toupper($0) ]++} 
    END{ for (name in count) { print name " appears " count[ name ] " times" }; 
}' input

来源

2012-01-23 12:31:59

+1，尽管不是'/./'，您可以使用'NF '这将跳过空行就好了。 –

@jaypal很好的建议。并且能够更好地处理带有空白的行。编辑。 –

假设data.txt包含你的字下面的脚本将做。

while read line 
do 
    uc=$(echo $line | tr [a-z] [A-Z] | tr -d ' ') 
    echo $uc $(grep -i "$uc" strs.txt | wc -l) 
done< data.txt | sort | uniq

输出。

31 
ALLEN 6 
MARK 4 
MOKADDIM 1 
SHIPLU 1 
TIM 4

另一种选择是

sort -f data.txt | uniq -i -c | while read num word 
do 
    echo $(echo $word|tr [a-z] [A-Z]) appeard $num times 
done

注：我看到你的文本文件包含空行。所以输出中的31包含空白行数。

来源

2012-01-23 10:17:50

这是'O（n²）'如果data.txt是strs.txt的副本 – Benoit

@ Benoit-是的，但我也想不到在单个迭代文件中实现目标的方式。 – Incognito

@Benoit：我的解决方案应该更快。 – choroba

您也可以sort和uniq使用带有标志忽略大小写：

sort -f FILE | uniq -ic

简单sed命令可以改变输出格式到指定的一个：

s/^ *\([0-9]\+\) \(.*\)/\2 appears \1 times/

来源

2012-01-23 10:21:49 choroba

伟大的一行:-)'sort -f FILE | uniq -ic | sed's/^ * \（。* \）/ \ 2出现\ 1次/'' –

for i in `sort filename |uniq -c`` 
do 
    # --if to print data as u like-- 
done

来源

2012-01-23 10:25:52

计算输入文件中字符串的出现次数

回答

相关问题