2012-07-13 82 views
0

我有一个包含多个列和行的CSV [File1.csv]。从特定列匹配输入文件的CSV中删除行

我有另一个CSV文件(只有一列),列出一个特定的单词[File2.csv]。

如果任何列匹配File2中列出的任何单词,我希望能够删除File1中的行。

我最初使用这样的:

grep -v -F -f File2.csv File1.csv > File3.csv 

这个工作,在一定程度上。我遇到的这个问题是列中有多个字(例如word1,word2,word3)。 File2包含word2,但并未删除该行。

我累了把这些单词分开看起来像这样:(word1,word2,word3),但是原来的命令不起作用。

如何从File2中删除包含单词的行,并可能包含其他单词?

回答

0

您可以在File2.csv中转换包含多个图案的分割线。

以下使用tr在将它们用作模式之前,将包含word1,word2的行转换为单独的行。所述<()构建临时充当文件/ FIFO(在bash测试):采用awk

grep -v -F -f <(tr ',' '\n' < File2.csv) File1.csv > File3.csv 
+0

所以我尝试了你的方法,我仍然留下了与'grep -v -F -f File2.csv File1.csv> File3.csv' – eloscurosecreto 2012-07-13 17:27:59

+0

相同的结果然后你需要向我们展示'File1.csv的确切样本'和'File2.csv'。以上工作与你迄今为止提供的内容一致。 – Thor 2012-07-13 17:42:35

+0

以下是指向这些文件的链接: - [File1.csv](https://www.dropbox.com/s/ryrk0ofenzzmfuj/File1.csv) - [Files2.csv](https:// www。 dropbox.com/s/o59t2lfobgjugd5/File2.csv) 我希望这有助于。谢谢! – eloscurosecreto 2012-07-13 17:58:50

1

一种方式。

内容script.awk

BEGIN { 
    ## Split line with a doble quote surrounded with spaces. 
    FS = "[ ]*\"[ ]*" 
} 

## File with words, save them in a hash. 
FNR == NR { 
    words[ $2 ] = 1; 
    next; 
} 

## File with multiple columns. 
FNR < NR { 
    ## Omit line if eigth field has no interesting value or is first line of 
    ## the file (header). 
    if ($8 == "N/A" || FNR == 1) { 
     print $0 
     next 
    } 

    ## Split interested field with commas. Traverse it searching for a 
    ## word saved from first file. Print line only if not found. 

    ## Change due to an error pointed out in comments. 
    ##--> split($8, array, /[ ]*,[ ]*/) 
    ##--> for (i = 1; i <= length(array); i++) { 
    len = split($8, array, /[ ]*,[ ]*/) 
    for (i = 1; i <= len; i++) { 
    ## END change. 

     if (array[ i ] in words) { 
      found = 1 
      break 
     } 
    } 
    if (! found) { 
     print $0 
    } 
    found = 0 
} 

假设File1.csvFile2.csv已经托尔的答案的评论提供的内容(我建议加上这些信息的问题),运行像脚本:

awk -f script.awk File2.csv File1.csv 

With following output:

"DNSName","IP","OS","CVE","Name","Risk" 
"ex.example.com","1.2.3.4","Linux","N/A","HTTP 1.1 Protocol Detected","Information" 
"ex.example.com","1.2.3.4","Linux","CVE-2011-3048","LibPNG Memory Corruption Vulnerability (20120329) - RHEL5","High" 
"ex.example.com","1.2.3.4","Linux","CVE-2012-2141","Net-SNMP Denial of Service (Zero-Day) - RHEL5","Medium" 
"ex.example.com","1.2.3.4","Linux","N/A","Web Application index.php?s=-badrow Detected","High" 
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","Apache HTTPD Server Version Out Of Date","High" 
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","PHP Unsupported Version Detected","High" 
"ex.example.com","1.2.3.4","Linux","N/A","HBSS Common Management Agent - UNIX/Linux","High" 
+0

我收到错误'awk:1:意外字符'。'' – eloscurosecreto 2012-07-13 17:17:01

+0

@eloscurosecreto:你是直接从命令行运行吗?我的意思是,不使用文件粘贴它。 – Birei 2012-07-13 17:25:43

+0

我创建了一个.awk文件,就是这个问题。我只是直接从cli运行你的代码,并没有任何问题完成,但是,我发现使用remove文件检查了输出,发现行中应该删除的多个单词实例。这种方法似乎只删除了列中包含单个单词的行(不包括“,word2,word3”)。 – eloscurosecreto 2012-07-13 17:43:25