2017-09-07 112 views
1

我有多个CSV文件象下面这样:awk格式化csv文件| unix | Solaris | AWK

~/Prod/Jcs/BIN/Dash_PPLP/load$ ls -lt *csv 
-rw-rw-r-- 1 tellus tellus  81 Sep 7 14:27 extraction_MBBSCS_PPL_USAGE_IMPORT.csv 
-rw-rw-r-- 1 tellus tellus  83 Sep 7 14:27 extraction_MBBSCS_PPL_INVOICE_IMPORT.csv 
-rw-rw-r-- 1 tellus tellus  71 Sep 7 14:27 extraction_INVOICE.csv 
-rw-rw-r-- 1 tellus tellus  69 Sep 7 14:27 extraction_USGRERUN.csv 
-rw-rw-r-- 1 tellus tellus  69 Sep 7 14:27 extraction_USG.csv 
-rw-rw-r-- 1 tellus tellus  72 Sep 7 14:27 extraction_LIA.csv 
-rw-rw-r-- 1 tellus tellus  74 Sep 7 14:27 extraction_MSISDN.csv 

通过打开一个文件

cat extraction_LIA.csv 
PPL_LIABILITY,2468705,Fri Sep 01 06:56:41 2017,Fri Sep 01 06:58:33 2017 

格式名,行,START_TIME和END_TIME每个流我要监控,以使它们“可加载”到ORACLE表中。

我已经做出了这样的脚本做变换和覆盖它们每一个,象下面这样:

cat transform_to_load.bash 
#!/bin/bash 
csv_files=$(ls *.csv) 
for i in $csv_files 
do 
x=$(nawk 'BEGIN { OFS=","; FS=","} {split($3,a," ");split($3,b," ")} 
{$3=a[3]"/"a[2]"/"a[5]" "a[4];$4=b[3]"/"b[2]"/"b[5]" "b[4]} 
{print}' $i) 
echo $x > $i 
done 

的问题是我NAWK:

x=$(nawk 'BEGIN { OFS=","; FS=","} {split($3,a," ");split($3,b," ")} 
    {$3=a[3]"/"a[2]"/"a[5]" "a[4];$4=b[3]"/"b[2]"/"b[5]" "b[4]} 
    {print}' $i) 

产生以下(开始时间与结束时间相同)

[email protected]:~/Prod/Jcs/BIN/Dash_PPLP/load$ cat extraction_LIA.csv 
PPL_LIABILITY,2468705,01/Sep/2017 06:56:41,01/Sep/2017 06:56:41 

我想实现的是将其格式化为w ithnak(SunOS)像这样每个人:

PPL_LIABILITY,2468705,01/Sep/2017 06:56:41,01/Sep/2017 06:58:33 

你能帮我用我的nawk输出正确的格式吗?

非常感谢!

回答

2

你几乎接近你的目标,需要纠正一点

原因:

它,因为在你的代码中有,

{split($3,a," "); split($3,b," ")} 
         ^
        So you get same result in end time 

正确的像低于

解决方案:

{split($3,a," "); split($4,b," ")} 
         ^
         Fourth Column will be used 

同时,如果你有兴趣,可以简化像下面,

不需要的

  • csv_files=$(ls *.csv)
  • x=$(nawk '{..}')
  • echo $x > $i

简体版

$ cat test.sh 
#!usr/bin/env bash 

for i in *.csv; do 

# Better Prefer 
# /usr/xpg4/bin/awk or /usr/xpg6/bin/awk 

    nawk ' 
      BEGIN{ 
       FS=OFS="," 
      } 
      function format_dt(v, a){ 
       split($v,a,/ /); 
       $v=a[3]"/"a[2]"/"a[5]" "a[4] 
      } 
      { 
       format_dt(3); 
       format_dt(4) 
      }1 
     ' "$i" >tmpfile && mv tmpfile "$i" 
done 
+0

嘿!非常感谢,所以调整到4美元将解决它,正确?? –

+0

@tln_jupiter:是的,你可以看到'$ 3'意思是第3个字段/列 –

+1

真的很有用,非常感谢:) –