的Perl：如何提取括号

之间的字符串我在MoinMoin的文本格式的文件：的Perl：如何提取括号

* [[ Virtualbox Guest Additions]] (2011/10/17 15:19) 
* [[ Abiword Wordprocessor]] (2010/10/27 20:17) 
* [[ Sylpheed E-Mail]] (2010/03/30 21:49) 
* [[ Kupfer]] (2010/05/16 20:18)

所有的言语间“[”和“]]”是条目的简短说明。我需要提取整个条目，但不是每个单词。

我找到了答案在这里类似的问题：https://stackoverflow.com/a/2700749/819596 但无法理解的答案：什么(?0)或/xg作用：该作品将被接受，但解释将有很大的帮助，即"my @array = $str =~ /(\{ (?: [^{}]* | (?0))* \})/xg;"

什么。

来源

2012-09-04 marinara

感谢您的答案会打个盹，并尝试答案！ – marinara

的代码可能会是这样的：

use warnings; 
use strict; 

my @subjects; # declaring a lexical variable to store all the subjects 
my $pattern = qr/ 
    \[ \[ # matching two `[` signs 
    \s*  # ... and, if any, whitespace after them 
    ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols 
    ]] 
/x; 

# main processing loop: 
while (<DATA>) { # reading the source file line by line 
    if (/$pattern/) {  # if line is matched by our pattern 
    push @subjects, $1; # ... push the captured group of symbols into our array 
    } 
} 
print $_, "\n" for @subjects; # print our array of subject line by line 

__DATA__ 
* [[ Virtualbox Guest Additions]] (2011/10/17 15:19) 
* [[ Abiword Wordprocessor]] (2010/10/27 20:17) 
* [[ Sylpheed E-Mail]] (2010/03/30 21:49) 
* [[ Kupfer]] (2010/05/16 20:18)

依我之见，你需要什么可以描述如下：在文件中的每一行试图找到符号的这个序列...

[[, an opening delimiter, 
then 0 or more whitespace symbols, 
then all the symbols that make a subject (which should be saved), 
then ]], a closing delimiter

正如你所看到的，这个描述很自然地转化为一个正则表达式。唯一可能不需要的是/x正则表达式修饰符，它允许我广泛地评论它。）

来源

2012-09-04 20:48:05 raina77ow

\[\[(.*)]]

\[是文字[， ]是文字]， .*指的0个或多个字符每个序列，东西用括号括起来是捕获组，因此，您稍后就可以访问它在你的脚本与$ 1（或$ 2 .. $ 9取决于你有多少组）。

放在一起，你会匹配两个[然后一切都交给两个连续]

更新最后一次出现在你的问题中的第二读我突然很困惑，你需要的内容[ [和]]或整条线 - 在这种情况下，完全放弃括号，只是测试模式是否匹配，不需要捕获。

来源

2012-09-04 20:48:18 pulven

my @array = $str =~ /(\{ (?: [^{}]* | (?0))* \})/xg;

'x'标志意味着在正则表达式中忽略空格，以允许更可读的表达式。 'g'标志表示结果将是从左到右的全部匹配列表（匹配* g * lobally）。

(?0)表示第一组圆括号内的正则表达式。这是一个递归的正则表达式，相当于一组规则，例如：

E := '{' (NoBrace | E) '}' 
NoBrace := [^{}]*

来源

2012-09-04 20:58:56 chepner

您发现答案是递归模式匹配，我认为你不需要。

/x允许在正则表达式中使用无意义的空格和注释。
/g通过所有字符串运行正则表达式。没有它只运行到第一场比赛。
/xg是/ x和/ g的组合。（？0）
再次运行正则表达式本身（递归）

如果我没有理解好了，你需要的东西是这样的：

$text="* [[ Virtualbox Guest Additions]] (2011/10/17 15:19) 
* [[ Abiword Wordprocessor]] (2010/10/27 20:17) 
* [[ Sylpheed E-Mail]] (2010/03/30 21:49) 
* [[ Kupfer]] (2010/05/16 20:18) 
"; 

@array=($text=~/\[\[([^\]]*)\]\]/g); 
print join(",",@array); 

# this prints " Virtualbox Guest Additions, Abiword Wordprocessor, Sylpheed E-Mail, Kupfer"

来源

2012-09-04 21:10:10 lalborno

如果文本永远不会包含]，您可以简单地使用以下推荐的以下内容：

/\[\[ ([^\]]*) \]\]/x

T他之后允许包含的文本]，但我建议不要将它纳入一个更大的格局：

/\[\[ (.*?) \]\]/x

下使]中包含的文本，而且是最强大的解决方案：

/\[\[ ((?:(?!\]\]).)*) \]\]/x

例如，

if (my ($match) = $line =~ /\[\[ ((?:(?!\]\]).)*) \]\]/x) { 
    print "$match\n"; 
}

或

my @matches = $file =~ /\[\[ ((?:(?!\]\]).)*) \]\]/xg;

/x：在图案忽略空格。允许添加空格以使模式可读，而不改变模式的含义。记录在perlre。
/g：找到所有符合条目。记录在perlop。
(?0)被用来使模式递归，因为链接节点必须处理任意嵌套的嵌套。 * /g：查找所有匹配项。记录在perlre。

来源

2012-09-04 21:18:55 ikegami

我会建议使用从模块文本::平衡 “extract_bracketed” 或 “extract_delimited” - 在这里看到：http://perldoc.perl.org/Text/Balanced.html

来源

2012-09-05 06:17:48

perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp

如下测试：

> cat temp 
     * [[ Virtualbox Guest Additions]] (2011/10/17 15:19) 
     * [[ Abiword Wordprocessor]] (2010/10/27 20:17) 
     * [[ Sylpheed E-Mail]] (2010/03/30 21:49) 
     * [[ Kupfer]] (2010/05/16 20:18) 
> 
> perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp 
    Virtualbox Guest Additions 
    Abiword Wordprocessor 
    Sylpheed E-Mail 
    Kupfer 
>

S /。 [[（。）]] */\ 1 /克
* [[ - （。*）>匹配任何系统字符直到[[
]]存储字符串之后的任何系统字符 “[[”直到“]]”在\ 1
。* - >匹配该行的其余部分。

然后，因为我们有我们的数据在\ 1我们可以简单地使用它在控制台上打印。

来源

2012-09-05 13:59:38 Vijay

的Perl：如何提取括号

回答

相关问题