2012-02-24 79 views
8

我有一个包含逗号分隔值的多个字段的数据库。我需要在Perl中分割这些字段,除了一些值后面跟着包含在方括号中的嵌套的CSV,我不想分割它们,这足够简单。括号内的逗号分隔列表除外?

例子:

recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education 

的分裂 “” 给我:

recycling 
environmental science 
interdisciplinary (e.g. 
consumerism 
waste management 
chemistry 
toxicology 
government policy 
and ethics) 
consumer education 

我要的是:

recycling 
environmental science 
interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics) 
consumer education 

任何Perl的正则表达式(perts)能伸出援助之手?

我试图修改一个正则表达式的字符串我类似的发现SO post返回任何结果:

#!/usr/bin/perl 

use strict; 
use warnings; 

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education}; 

my @parts = $s =~ m{\A(\w+) ([0-9]) (\([^\(]+\)) (\w+) ([0-9]) ([0-9]{2})}; 

use Data::Dumper; 
print Dumper \@parts; 
+0

你到目前为止试过了什么?请先自己努力,然后提出问题,说明你做了什么。 – 2012-02-24 18:03:42

+0

您不能使用正则表达式来分析嵌套的表达式。你需要一个完整的解析器。 – Ether 2012-02-24 18:04:39

+0

你可能会看看[Text :: CSV](http://search.cpan.org/perldoc?Text::CSV),看看你是否可以调整它来做你需要的。 – TLP 2012-02-24 18:13:46

回答

9

试试这个:

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education}; 

my @parts = split /(?![^(]+\)), /, $s; 
+0

我刚刚发现相同的东西[这里](http://stackoverflow.com/questions/8481345/perl-split-and-regular-expression),它的工作原理。谢谢! – calyeung 2012-02-24 18:18:59

0

有没有人说你必须这样做,在一个步? 您可以在循环中切分值。鉴于你的例子,你可以使用这样的东西。

use strict; 
use warnings; 
use 5.010; 

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education}; 

my @parts; 
while(1){ 

     my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/; 
     if (not $elem) { 
       say "second approach"; 
       ($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/; 
     } 
     $s = $rest; 
     push @parts, $elem; 
     last if not $s; 

} 

use Data::Dumper; 
print Dumper \@parts; 
2

您选择的解决方案是优越的,但对于那些谁会说,否则,正则表达式有一个递归元素,将匹配嵌套圆括号。以下工作正常

use strict; 
use warnings; 

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education}; 

my @parts; 

push @parts, $1 while $s =~/
((?: 
    [^(),]+ | 
    (\(
    (?: [^()]+ | (?2))* 
    \)) 
)*) 
(?: ,\s* | $) 
/xg; 


print "$_\n" for @parts; 

即使括号进一步嵌套。不,它不漂亮,但它确实有效!

+0

+1(平衡)解决方案。 :) – zx81 2014-05-27 01:55:48

0

另一种使用循环和split的方法。我还没有测试性能,但不应该比预测正则表达式解决方案更快(因为$str的长度增加)?

my @elems = split ",", $str; 
my @answer; 
my @parens; 
while(scalar @elems) { 
    push @answer,(shift @elems) while($elems[0] !~ /\(/); 
    push @parens, (shift @elems) while($elems[0] !~ /\)/); 
    push @answer, join ",", (@parens, shift @elems); 
    @parens =(); 
}