AWK：印花线并排当第一字段是记录相同

我有一个包含像AWK：印花线并排当第一字段是记录相同

a x1 
b x1 
q xq 
c x1 
b x2 
c x2 
n xn 
c x3

行的文件，我想在每一行上测试拳头字段，并且如果有是一个匹配，我想将匹配的行添加到第一行。输出应该看起来像

a x1 
b x1 b x2 
q xq 
c x1 c x2 c x3 
n xn

任何帮助将不胜感激

来源

2013-10-24 Deb.M

使用awk你可以这样做：

awk '{arr[$1]=arr[$1]?arr[$1] " " $0:$0} END {for (i in arr) print arr[i]}' file 
n xn 
a x1 
b x1 b x2 
c x1 c x2 c x3 
q xq

来源

2013-10-24 14:59:19 Jotne

这会产生一些awks（OSX？）由于周围的三元运算符缺少括号中的语法错误。 'arr [$ 1] = arr [$ 1]？arr [$ 1]“”$ 0：$ 0''应写入'arr [$ 1] =（arr [$ 1]？arr [$ 1]“”$ 0：$ 0）删除指定$ 0两次的冗余：'arr [$ 1] =（arr [$ 1]？arr [$ 1]“”：“”）$ 0' –

要保留输入顺序：

$ awk ' 
{ 
    if ($1 in vals) { 
     prev = vals[$1] " " 
    } 
    else { 
     prev = "" 
     keys[++k] = $1 
    } 
    vals[$1] = prev $0 
} 
END { 
    for (k=1;k in keys;k++) 
     print vals[keys[k]] 
} 
' file 
a x1 
b x1 b x2 
q xq 
c x1 c x2 c x3 
n xn

来源

2013-10-24 20:01:29

这已经很久了！今天看到了，谢谢你的回复 - 格式良好的代码 - 希望其他人会发现它有帮助。对我而言，我正在研究的这个项目对时间很敏感，而且我必须在我有限的理解下做一些工作。这是我最终做的事情： - –

我最终做了。（Ed Morton和Jonte的答案显然更优雅。）

首先，我将输入文件的第一列保存在一个单独的文件中。

awk '{print $1}' input.file.txt > tmp0

然后保存输入文件，其中包含行数为$ 1的重复值的行被删除。

awk 'BEGIN { FS = "\t" }; !x[$1]++ { print $0}' input_file.txt > tmp1

然后保存所有具有重复$ 1字段的行。

awk 'BEGIN { FS = "\t" }; x[$1]++ { print $0}' input_file.txt >tmp2

然后保存非重复文件（tmp1）的$ 1字段。

awk '{ print $1}' tmp1 > tmp3

我用一个for循环将重复文件（tmp2）和重复删除文件（tmp1）中的行拉入输出文件。

for i in $(cat tmp3) 
do 
if [ $(grep -w $i tmp0 | wc -l) = 1 ] #test for single instance in the 1st col of input file 
then 
echo "$(grep -w $i tmp1)" >> output.txt #if single then pull that record from no dupes 
else 
echo -e "$(grep -w $i tmp1) \t $(grep -w $i tmp2 | awk '{ 
      printf $0"\t" }; END { printf "\n" }')" >> output.txt # if not single then pull that record from no_dupes first then all the records from dupes in a single line. 
fi 
done

最后删除tmp文件，

rm tmp* # remove all the tmp files

来源

2016-03-03 19:22:55

AWK：印花线并排当第一字段是记录相同

回答

相关问题