2014-07-06 67 views
0

我有这个对应的输入文字:
解析输入文本

Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre  II]] movie directed [[Source:NYTimes]]... 
    Clark visited the [[University of Pleasantville]] campus in November 2009 to ... 
    *[[1973]] – [[Clark Kent]], superhero and newspaper reporter... 
    After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... 
    Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code> 

这是我在Java中使用的模式代码:

<code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)"; 
    Pattern r = Pattern.compile(pattern); 
    Matcher m = r.matcher(data); 
     while (m.find()) { 
     System.out.println("Found value: " + m.group(1)); 
     } 

我读文件中的行通过使用BufferedReader的readLine(系统解析每行),并使用我的正则表达式获得以下输出:
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]... Clark visited the [[University of Pleasantville]] campus in November 2009 to ... Found value: [[University of Pleasantville]] *[[1973]] &ndash; [[Clark Kent]], superhero and newspaper reporter... Found value: [[1973]] After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... Found value: [[negative hero]] Found value: [[Alternate Superman]] Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]... Found value: [[Daily Planet]] Found value: [[Louis Lane]]

正如您所看到的那样:我无法提取花括号中的所有内容[[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]。例如:从第一行我应该已经提取[[超人(英雄)|超人]]等,但它没有检索任何东西。我如何修改我的正则表达式来提取除[[Source:something]]等之外的所有东西?谢谢。

+0

整个文本追加到字符串,然后匹配 – nikolap

+0

是,这个问题@nikolap?逐行阅读有什么不对? – Knight

+0

我不确定所有文字,但可能有类似[[Lois Lane和下一行关闭]] – nikolap

回答

1

使用负前瞻(例如(?!...))是这样的:

\[\[(?!Arch:|Zeus:|Source).*?\]\] 

看到它在行动:http://regex101.com/r/lJ6sH3/1

+0

Thanks @mrhobo。这样可行! – Knight