也可以使用awk
这个;说你有:
$ cat a.csv
#product_id,product_name,brand_name,price
1,pname1,bname1,100
10,pname10,bname10,200
20,pname20,bname20,300
$ cat b.csv
#product_id,product_category,product_name,brand_name,price
3,pcat3,pname3,bname3,42
10,pcat10,pname10,bname10,199
20,pcat20,pname20,bname20,299
30,pcat10,pname30,bname30,420
随着 “FNR == NR” 的做法(见例如> The Unix shell: comparing two files with awk):
$ awk -F, 'FNR==NR{if(!/^#/){a[$1]=$0;next}}($1 in a){split(a[$1],tmp,",");printf "%d,%s,%s,%s,%d\n",$1,$2,$3,$4,tmp[4];}' a.csv b.csv
10,pcat10,pname10,bname10,200
20,pcat20,pname20,bname20,300
随着读取每个文件到一个数组(见例如Awking it – how to load a file into an array in awk | Tapping away):
$ awk -F, 'BEGIN{while(getline < "a.csv"){if(!/^#/){a[$1]=$0;}}close("a.csv");while(getline < "b.csv"){if($1 in a){split(a[$1],tmp,",");printf "%d,%s,%s,%s,%d\n",$1,$2,$3,$4,tmp[4];}}close("b.csv");}'
10,pcat10,pname10,bname10,200
20,pcat20,pname20,bname20,300
在本质上,这两种方法做同样的事情:
- 读取的第一个文件(
a.csv
),并存储其线关联数组a
,键控/由索引该行的第一字段$1
(在这种情况下,product_id
);
- 然后读取所述第二文件(
b.csv
);并且如果其每行的第一个字段在数组a
中找到;然后输出当前行b.csv
的前四个字段;和第四场(price
)从在阵列中的相应条目a
不同的是,与所述FNR==NR
方法中,在命令行指定作为参数的输入文件awk
,基本上只能识别第一个文件为“特殊”,因此您可以将其存储为数组;用第二种方法,每个输入文件可以在一个单独的数组解析 - 但是,输入文件在awk
脚本本身指定的,而不是在参数awk
- 从那以后,你甚至都不需要使用参数awk
,整个awk
脚本需要在BEGIN{...}
块内发生。
当正在从文件中读取线,它们会自动在字段根据-F,
命令行选项,它设置逗号作为分隔符分割;然而,检索存储在阵列中的行的时候,我们必须split()
他们的第单独
击穿:
FNR==NR # if FNR (input record number in the current input file) equals NR (total num records so far)
# only true when the first file is being read
{
if(!/^#/) # if the current line does not `!` match regex `/.../` of start `^` with `#`
{
a[$1]=$0; # assign current line `$0` to array `a`, with index/key being first field in current line `$1`
next # skip the rest, and start processing next line
}
}
# --this section below executes when FNR does not equal NR;--
($1 in a) # first, check if first field `$1` of current line is in array `a`
{
split(a[$1],tmp,","); # split entry `a[$1]` at commas into array `tmp`
printf "%d,%s,%s,%s,%d\n",$1,$2,$3,$4,tmp[4]; # print reconstructed current line,
# taking the fourth field from the `tmp` array
}
击穿用于第二:
BEGIN{ # since no file arguments here, everything goes in BEGIN block
while(getline < "a.csv"){ # while reading lines from first file
if(!/^#/){ # if the current line does not `!` match regex `/.../` of start `^` with `#`
a[$1]=$0; # store current line `$0` to array `a`, with index/key being first field in current line `$1`
}
}
close("a.csv");
while(getline < "b.csv"){ # while reading lines from second file
if($1 in a){ # first, check if first field `$1` of current line is in array `a`
split(a[$1],tmp,","); # (same as above)
printf "%d,%s,%s,%s,%d\n",$1,$2,$3,$4,tmp[4]; # (same as above)
}
}
close("b.csv");
} # end BEGIN
注意有关执行与FNR==NR
:
$ awk -F, 'FNR==NR{print "-";} (1){print;}' a.csv b.csv # or:
$ awk -F, 'FNR==NR{print "-";} {print;}' a.csv b.csv
-
#product_id,product_name,brand_name,price
-
1,pname1,bname1,100
-
10,pname10,bname10,200
-
20,pname20,bname20,300
#product_id,product_category,product_name,brand_name,price
3,pcat3,pname3,bname3,42
10,pcat10,pname10,bname10,199
20,pcat20,pname20,bname20,299
30,pcat10,pname30,bname30,420
$ awk -F, 'FNR==NR{print "-";} FNR!=NR{print;}' a.csv b.csv
-
-
-
-
#product_id,product_category,product_name,brand_name,price
3,pcat3,pname3,bname3,42
10,pcat10,pname10,bname10,199
20,pcat20,pname20,bname20,299
30,pcat10,pname30,bname30,420
这意味着,当FNR不等于NR的“本节以下执行;“上面的评论原则上是错误的 - 即使这就是这个特定的例子最终表现如何。
你需要自动化这个还是只是一次性动作? – saamorim
只需要一次,我会再次在未来,但手动 –