用逗号分隔CSV文本？

我正在尝试编写一些RHEL安全加固自动化脚本，并且我有一个CSV文件，我试图将这些信息生成可读内容。这是我到目前为止...用逗号分隔CSV文本？

#!/bin/bash 

# loop through the file 
while read line; do 
     # get all of the content 
     vulnid=`echo $line | cut -d',' -f1` 
     ruleid=`echo $line | cut -d',' -f2` 
     stigid=`echo $line | cut -d',' -f3` 
     title=`echo $line | cut -d',' -f4` 
     discussion=`echo $line | cut -d',' -f5` 
     check=`echo $line | cut -d',' -f6` 
     fix=`echo $line | cut -d',' -f7` 

     # Format the content 

     echo "########################################################" 
     echo "# Vulnerability ID: $vulnid" 
     echo "# Rule ID: $ruleid" 
     echo "# STIG ID: $stigid" 
     echo "#" 
     echo "# Rule: $title" 
     echo "#" 
     echo "# Discussion:" 
     echo "# $discussion" 
     echo "# Check:" 
     echo "# $check" 
     echo "# Fix:" 
     echo "# $fix" 
     echo "########################################################" 
     echo "# Start Check" 
     echo 
     echo "# Start Remediation" 
     echo 
     echo "########################################################" 

done < STIG.csv

我遇到的问题是，我在CSV中的文本包含逗号。这实际上非常好，符合IETF标准（http://tools.ietf.org/html/rfc4180#page-2第2.4节）。但是，正如您可以想象的那样，剪切命令不会展望查看逗号后是否有尾随空格（正如您通常使用英文）。这导致我的所有领域都搞砸了，我无法弄清楚如何让这一切都正常工作。

现在，我有一种感觉，有一种神奇的正则表达式，我可以像''，'[：blank：]'一样使用，但是如果我知道如何利用它，我会被诅咒的。我习惯于使用剪切只是因为它快速而肮脏，但也许有人使用awk或sed可能会有更好的建议。这主要是为了生成我自己的程序的批量结构，这个结构是重复的，并且是评论的TON。

需要注意的是，它必须在干净的RHEL6上运行。我会用Ruby，Python等写这个。但是，其中大部分是必须安装的额外软件包。该脚本将部署的环境是机器没有任何互联网访问或额外软件包的地方。 Python 2.6默认在CentOS6上，但是RHEL6（我认为）。否则，请相信我，我会用Ruby写这篇文章。

这里的CSV样本：

V-38447,SV-50247r1_rule,RHEL-06-000519,The system package management tool must verify contents of all files associated with packages.,The hash on important files like system executables should match the information given by the RPM database. Executables with erroneous hashes could be a sign of nefarious activity on the system.,"The following command will list which files on the system have file hashes different from what is expected by the RPM database. # rpm -Va | grep '$1 ~ /..5/ && $2 != 'c''If there is output, this is a finding.","The RPM package management system can check the hashes of installed software packages, including many that are important to system security. Run the following command to list which files on the system have hashes that differ from what is expected by the RPM database: # rpm -Va | grep '^..5'A 'c' in the second column indicates that a file is a configuration file, which may appropriately be expected to change. If the file that has changed was not expected to then refresh from distribution media or online repositories. rpm -Uvh [affected_package]OR yum reinstall [affected_package]"

而且，如果有人想了解情况，整个项目是out on GitHub.

来源

2014-02-18 Apocrathia

什么是您的CSV样子？ –

老实说，我认为你最好的选择是使用支持CSV的脚本语言。我个人使用Python。它的'csv'模块非常易于使用，并且可以处理任何您可能遇到的（ASCII）CSV。如果你不是Python，但你对Perl感到满意，那也可以。这可能是我会推荐的两位主要候选人。 –

您应该从旧的和过时的抽屉变成括号'$（...）'eks：'fix = $（echo $ line | cut -d'，'-f7）'。你也可以从'echo $ line |改变cut -d'，'-f7'到'cut -d'，'-f7 <<< $ line' – Jotne

对您的问题的所有评论都是好的。不支持CSV内置到bash，所以如果你不想使用像Python，Ruby，Erlang甚至Perl这样的语言，你必须推出你自己的语言。

请注意，尽管awk可以使用逗号作为字段分隔符，但它也不能正确支持嵌入带引号的字段中的逗号。正如Håkon所建议的那样，您可以用一种模式将解决方案拼凑在一起。

但是你不需要在awk中这样做;你可以单独在bash中做到这一点，并避免调用外部工具。这样的事情呢？

#!/bin/bash 

nextfield() { 
    case "$line" in 
    \"*) 
     value="${line%%\",*}\"" 
     line="${line#*\",}" 
     ;; 
    *) 
     value="${line%%,*}" 
     line="${line#*,}" 
     ;; 
    esac 
} 

# loop through the file 
while read line; do 

    # get the content 
    nextfield; vulnid="$value" 
    nextfield; ruleid="$value" 
    nextfield; stigid="$value" 
    nextfield; title="$value" 
    nextfield; discussion="$value" 
    nextfield; check="$value" 
    nextfield; fix="$value" 

    # format the content 
    printf "########################################################\n" 
    printf "# Vulnerability ID: %s\n" "$vulnid" 
    printf "# Rule ID: %s\n# STIG ID: %s\n#\n" "$ruleid" "$stigid" 
    printf "# Rule: %s\n" "$title" 
    printf "#\n# Discussion:\n" 
    fmt -w68 <<<"$discussion" | sed 's/^/# /' 
    printf "# Check:\n" 
    fmt -w68 <<<"$check" | sed 's/^/# /' 
    printf "# Fix:\n" 
    fmt -w68 <<<"$fix" | sed 's/^/# /' 
    printf "########################################################\n" 
    printf "# Start Check\n\n" 
    printf "# Start Remediation\n\n" 
    printf "########################################################\n" 

done < STIG.csv

速度优势将是巨大的，如果你正在做很多这些。

请注意改进后的格式，由fmt提供。这种杀死避免调用外部程序的速度优势，但它确实使您的输出更易于阅读。 :)

来源

2014-02-18 21:24:01 ghoti

这工作完美，而且速度非常快。我现在有一个换行问题，但这只是表面化妆。 – Apocrathia

请参阅我的更新以解决换行问题。 – ghoti

并使用[Linux选项]（http://www.freebsd.org/cgi/man.cgi？query = fmt＆apropos = 0＆sektion = 0＆manpath = Red + Hat + Linux％2Fi386 + 9＆arch = default＆format = html）for'fmt'而不是[FreeBSD ones]（http://www.freebsd.org/cgi/man.cgi?query = FMT）。 :-) – ghoti

+1约翰Ÿ的评论。这里有一个红宝石例如

ruby -rcsv -e 'CSV.foreach("STIG.csv") do |row| 
    (vulnid, ruleid, stigid, title, disc, check, fix) = row 
    puts "#" * 40 
    puts "# Vulnerability ID: #{vulnid}" 
    puts "# Rule ID: #{ruleid}" 
    puts "# STID ID: #{stigid}" 
    puts "#" 
    puts "# Discussion:" 
    puts "# #{disc}" 
    puts "# Check:" 
    puts "# #{check}" 
    puts "# Fix:" 
    puts "# #{fix}" 
    puts "#" * 40 
end'

如果你想换行排长队，做这样的事情：

puts fix.gsub(/(.{1,78})(?:\s+|\Z)/) {|s| "# " + s + "\n"}

来源

2014-02-18 20:52:53

不幸的是，我不能使用Ruby，因为它默认没有安装，目标系统可能没有安装Internet访问或任何额外的软件包。 – Apocrathia

在了GNU AWK版本4，你可以尝试：

gawk -f a.awk STIG.csv

哪里a.awk是：

BEGIN { 
    FPAT = "([^,]*)|(\"[^\"]+\")" 
} 

{ 
    for (i=1; i<=NF; i++) 
     print "$"i"=|"$i"|" 
    print "# Rule: "$4 
}

输出：

$ cat STIG.csv 
vulnid,ruleid,stigid,"This is a title, hello","A discussion, ,,",check,fix 

$ gawk -f a.awk STIG.csv 
$1=|vulnid| 
$2=|ruleid| 
$3=|stigid| 
$4=|"This is a title, hello"| 
$5=|"A discussion, ,,"| 
$6=|check| 
$7=|fix| 
# Rule: "This is a title, hello"

来源

2014-02-18 20:58:10

RHEL6只附带gawk 3.这是不行的。 – Apocrathia

+1使用FPAT，关于'FPAT'的详细信息在这里：https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html – BMW

@宝贵感谢:) .. –

您最大的问题是包含换行符的字段的可能性。本着这种精神，使用支持CSV的语言的建议是最佳解决方案。

但是，如果你唯一的问题是逗号（并且你知道你的字段中不会有换行符），你可以用bash轻松解决它，通过临时替换引号空间序列和一个未使用的字符组合您的选择，并在输出前将其重新更换：

#!/bin/bash 

while IFS=',' read vulnid ruleid stigid title discussion check fix; do 
    echo "# Vulnerability ID: $vulnid" 
    ... 
    echo "# Discussion:" 
    echo "# $discussion" 
    ... 
done <<<"$(sed 's/, /COMMASPACE/g' <STIG.csv)" | sed 's/COMMASPACE/, /g'

来源

2014-02-18 21:23:41

不幸的是，我只能访问系统默认安装的内容，没有别的。 – Apocrathia

@Apocrathia：所有你需要的是'sed'，你的帖子提到'awk'和'sed'都是可接受的手段。 –

那么，如果可以保证在引用字段的内容中的所有逗号总是后面跟一个空格...... –

下面是我的答案在Count number of column in a pipe delimited file的一些有所改进的版本，也是针对这个特定的问题。一个真正的CSV解析器实现将是最好的，但下面的使用awk的hack工作，只要字段不是分割成多行，当字段以引用开始并持续到下一个不在同一行的引用。它还假定它收到的文件已经格式良好。唯一的问题是它会在最后一个字段后输出OFS。这在你的特定情况下不应该是一个问题。

只需在上面的while循环前添加以下内容，并根据需要更改OFS的值，并确保将cut的分隔符更改为匹配。 OFS默认为|，但如果你想使用-v选项，允许AWK如图所示，你可以重写它：

outfile="$(mktemp 2>/dev/null || printf '%s' "/tmp/STIG.$$")" 

outdelim='|' 

awk -F',' -vOFS="$outdelim" STIG.csv >"$outfile" <<EOF 
#WARNING: outputs OFS after the last field, meaning an empty field is at the end. 
BEGIN{ if (OFS=="") OFS='|' } 

{ 
    for (i = 1; i <= NF; i++) { 
     if ($i ~ /^".*[^"]$/) 
      for (; i <= NF && ($i !~ /.*"$/); i++) { 
       printf("%s%s", $i, FS); 
      } 
     printf("%s%s", $i, OFS); 
    } 
} 
EOF 

# loop through the file 
while read line; do 
    # get all of the content 
    vulnid="$(echo $line | cut -d"$outdelim" -f1)" 
    . 
    . 
    . 
done < "$outfile" 

rm -f "$outfile"

来源

2014-02-18 22:16:17

用逗号分隔CSV文本？

回答

相关问题