2015-08-24 118 views
2

追加匹配的名字,我有2个csv文件awk中搜索和其他csv文件

文件1包含

product_id, category_id, price 
pid01,cat01,10 
pid02,cat01,10 
pid03,cat01,20 
pid04,cat02,30 
pid05,cat02,20 
pid06,cat03,30 

文件2包含

category_id, category_name 
cat01,Mouse 
cat02,Cat 
cat03,Fish 
cat04,Dog 

我需要这样

结果
product_id, category_id, category_name, price 
pid01,cat01,Mouse,10 
pid02,cat01,Mouse,10 
pid03,cat01,Mouse,20 
pid04,cat02,Cat,30 
pid05,cat02,Cat,20 
pid06,cat03,Fish,30 

product_id, category_name, price 
pid01,Mouse,10 
pid02,Mouse,10 
pid03,Mouse,20 
pid04,Cat,30 
pid05,Cat,20 
pid06,Fish,30 

我怎么achive它Bash或awk中?

+0

do es file2的第一行包含标题 – amdixon

+0

是的,让我更新问题 – billyduc

回答

4

这awk将做到这一点:

awk -F, 'NR==FNR{a[$1]=$2;next}FNR>1{print $1,$2,a[$2],$3}' OFS=, file2 file1 

顺便说一句,你还需要添加标题。让我以多行格式解释脚本:

# Specify the field delimiter and print the headers 
BEGIN { 
    FS=OFS="," 
    $1="product_id" 
    $2="category_id" 
    $3="category_name" 
    $4="price" 
    print 
} 

# As long as the total number of records (NR) equals 
# number of records is equal to the number of records 
# in the current input file (FNR) we populate data 
# from file2 to the lookup table 'a' 
NR==FNR{ 
    a[$1]=$2 
    next # Skip the following block and go on parsing file2 
} 

# Skip line 1 in file1, inject column 3 with the value from 
# the lookup table and output the record 
FNR>1{ 
    print $1,$2,a[$2],$3 
} 

请检查anubhava's comment。在gawkmawk使用-F', *'可以更简单地实现标题的打印。逗号后面的可选空格是因为列标题中有一个空格。我会在处理之前简单地删除该空间。

+0

'awk -F',*'-v OFS =,'FNR == NR {a [$ 1] = $ 2;下一个} {print $ 1,$ 2,a [$ 2],$ 3}'file2 file1'也会得到标题行。 – anubhava

+1

@anubhava好抓! :)我已经想知道为什么它不在首位工作,但想完成我的解释。错过了标题中的空间!谢谢! – hek2mgl

+0

Fanstatic!谢谢hek2mgl – billyduc

3

随着加入:

join --header -t , -1 2 -2 1 -o 1.1,1.2,2.2,1.3 file1 file2 

输出:

 
pid01,cat01,Mouse,10 
pid02,cat01,Mouse,10 
pid03,cat01,Mouse,20 
pid04,cat02,Cat,30 
pid05,cat02,Cat,20 
pid06,cat03,Fish,30 
+1

不错的一个,我总是为了得到这个工具。 – hek2mgl

0

您可以创建一个shell脚本(process_csv.sh)像这样:

#!/bin/sh 

data=`cat file1.csv | sed -n '/pid/,$ p'` 
data2=`cat file2.csv` 
echo "product_id, category_id, price, category_name" > final.csv 
#since category_id is common in both files, we lookup category names based on that id. 
for row in $data 
      do 
        cat_id=`printf $row | awk -F "," '{print $2'}` 
        category_name=`printf "$data2" | grep "$cat_id" | cut -f2 -d','` 
        #now we write category_name to file and append it to row/line with corresponding product_id 
        echo $row","$categor_name >> final.csv 


      done 

只要运行” ./process_csv .sh“和final.csv文件将包含您的结果