可以与perl及其Text::CSV_XS
模块尝试:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my (@columns);
open my $fh, '<', shift or die;
my $csv = Text::CSV_XS->new or die;
while (my $row = $csv->getline($fh)) {
undef @columns;
if (@$row <= 12) {
@columns = @$row;
next;
}
my $extra_columns = (@$row - 12)/2;
my $post_columns_index = 4 + 2 * $extra_columns * 2;
@columns = (
@$row[0..3],
(join('', @$row[4..(4+$extra_columns)])) x 2,
@$row[$post_columns_index..$#$row]
);
}
continue {
$csv->print(\*STDOUT, \@columns);
printf "\n";
}
假设与三根线,其中所述第一个具有一个额外的逗号一个输入文件(infile
),第二个具有两个附加逗号,第三个是正确的:
2011,123456,1234567,12345678,Hey There,How are you,Hey There,How are you,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
2011,123456,1234567,12345678,Hey There,How are you,now,Hey There,How are you,now,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
2011,123456,1234567,12345678,Hey There:How are you,Hey There:How are you,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
运行脚本,如:
perl script.pl infile
国债收益率:
2011,123456,1234567,12345678,"Hey ThereHow are you","Hey ThereHow are you",882864309037,"ABC ABCD",LABACD,1.00000000,80.2500000,"One Two"
2011,123456,1234567,12345678,"Hey ThereHow are younow","Hey ThereHow are younow",LABACD,1.00000000,80.2500000,"One Two"
2011,123456,1234567,12345678,"Hey There:How are you","Hey There:How are you",882864309037,"ABC ABCD",LABACD,1.00000000,80.2500000,"One Two"
需要注意的是它增加了一些报价,但它是正确的总部设在csv
规范,更容易处理了以前的状态。
第4列和第7列总是包含数字? –
如果可能的话,最好在包含逗号的列上使用封装来正确地重新请求或重新生成csv文件。 例如'2011,123456,1234567,12345678,“你好,你好吗”,“你好,你好吗”,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two' – AeroX