File1中
$ cat file1
A B
hello 0.5
bye 0.4
文件2
$ cat file2
C D
hello 1
country 5
输出
$ awk 'NR==1{print "Text","B","D"}FNR==1{next}FNR==NR{A[$1]=$2;next}{print $0,(f=$1 in A ? A[$1] : ""; if(f)delete A[$1]}END{for(i in A)print i,"",A[i]}' OFS='\t' file2 file1
Text B D
hello 0.5 1
bye 0.4
country 5
更好的阅读的版本
awk '
# Print header when NR = 1, this happens only when awk reads first file
NR==1{print "Text","B","D"}
# Number of Records relative to the current input file.
# When awk reads from the multiple input file,
# awk NR variable will give the total number of records relative to all the input file.
# Awk FNR will give you number of records for each input file
# So when awk reads first line, stop processing and go to next line
# this is just to skip header from each input file
FNR==1{
next
}
# FNR==NR is only true while reading first file (file2)
FNR==NR{
# Build assicioative array on the first column of the file
# where array element is second column
A[$1]=$2
# Skip all proceeding blocks and process next line
next
}
{
# Check index ($1 = column1) from second argument (file1) exists in array A
# if exists variable f will be 1 (true) otherwise 0 (false)
# As long as above state is true
# print current line and element of array A where index is column1
print $0,(f=$1 in A ? A[$1] : "")
# Delete array element corresponding to index $1, if f is true
if(f)delete A[$1]
}
# Finally in END block print array elements one by one,
# from file2 which does not exists in file1
END{
for(i in A)
print i,"",A[i]
}
' OFS='\t' file2 file1
是您的样本文件1真的这样呢?标签在哪里?你为什么在'-k2'上排序,但是使用'-j 1'来加入?另外请注意'man join'中的'-e'选项可能有助于找到不匹配的项目。祝你好运。 – shellter
这种为我工作。 'join -t $'\ t'-1 1 -2 1 <(sort -k1 file1.tsv)<(sort -k1 file2.tsv)> join_test.tsv'我遇到的主要问题是定义了tab分隔符。 – jxn
良好的接触和抱歉,我错过了这一关键点。我很高兴你有一个解决方案。对于那些已经发布可用解决方案的人来说,它绝不会感到痛苦。它给人们激励分享他们所知道的东西。祝你们好运。 – shellter