我试图将特定单词分隔的任何合理长度的2个子句（在示例“AND”中）分开，其中第二个可以是可选的。一些示例：按特定词分组的句子分组

CASE1：

foo sentence A AND foo sentence B

应当给予

"foo sentence A" --> matching group 1 

"AND" --> matching group 2 (optionally) 

"foo sentence B" --> matching group 3

CASE2：

foo sentence A

应当给予

"foo sentence A" --> matching group 1 
"" --> matching group 2 (optionally) 
"" --> matching group 3

我尝试以下的正则表达式

(.*) (AND (.*))?$

和它的作品，但只有当，在CASE2，我把一个空的空间在字符串的最后位置，否则，模式不匹配。如果我在圆括号组内的“AND”之前加入空格，那么在案例1中，匹配器在第一组中包含整个字符串。我想知道前进和后视断言，但不知道他们能帮助我。有什么建议吗？感谢

来源

2013-05-25 martin.p

我会用这个表达式：

^(.*?)(?: (AND) (.*))?$

解释：

The regular expression: 

(?-imsx:^(.*?)(?: (AND) (.*))?$) 

matches as follows: 

NODE      EXPLANATION 
---------------------------------------------------------------------- 
(?-imsx:     group, but do not capture (case-sensitive) 
         (with^and $ matching normally) (with . not 
         matching \n) (matching whitespace and # 
         normally): 
---------------------------------------------------------------------- 
^      the beginning of the string 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    .*?      any character except \n (0 or more times 
          (matching the least amount possible)) 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (optional 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
          ' ' 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
     AND      'AND' 
---------------------------------------------------------------------- 
    )      end of \2 
---------------------------------------------------------------------- 
          ' ' 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
     .*      any character except \n (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
    )      end of \3 
---------------------------------------------------------------------- 
)?      end of grouping 
---------------------------------------------------------------------- 
    $      before an optional \n, and the end of the 
          string 
---------------------------------------------------------------------- 
)      end of grouping 
----------------------------------------------------------------------

来源

2013-05-26 09:41:52 Toto

这工作太，似乎很简单。谢谢。 –

如何只使用

String split[] = sentence.split("AND");

将由你的字拆分句子，并给你的子部件的列表。

来源

2013-05-25 21:22:57 greedybuddha

这是一种方式，但结果不得存放在数组中。不管怎么说，还是要谢谢你。 –

你的意思是你不希望结果存储在数组中吗？因为使用split会返回一个数组。 – greedybuddha

没错。现在我正在管理存储在匹配器中的组。 –

改变你的正则表达式来使空间后，他第一句话可选：

(.*\\S) ?(AND (.*))?$

或者你可以使用split()消耗AND和周围的任何空间：

String sentences = sentence.spli("\\s*AND\\s*");

来源

2013-05-25 21:26:05 Bohemian

如果你想让它产生任何效果，那么这个问号应该在第一个'*'之后。 –

@TimPietzcker哦是的。但我认为编辑优于另一个问号，因为语义更清晰。谢谢。 – Bohemian

的情况下2是小奇怪...

但我会做

String[] parts = sentence.split("(?<=AND)|(?=AND)"));

你检查parts.length。如果长度== 1，那么它是case2。你只需要在数组中的句子，你可以添加空字符串作为你的“组2/3”

如果在案例1中，有直接parts：

[foo sentence A , AND, foo sentence B]

来源

2013-05-25 21:30:27 Kent

你能解释一下这是如何处理类似'三明治美味，我喜欢KITTENS' –

说明

此正则表达式将返回请求将字符串部分放入请求的组中。 and是可选的，如果没有在字符串中找到它，则整个字符串将被放入组1.所有\s*?都会强制捕获的组自动裁剪其空白区域。

^\s*?\b(.*?)\b\s*?(?:\b(and)\b\s*?\b(.*?)\b\s*?)?$

enter image description here

组

0获取整个匹配的字符串

得到分隔条件字and前的字符串，如果没有and那么整个字符串出现在这里
gets分离的话，在这种情况下，它and
得到字符串的第二部分

的Java代码示例：

案例1

import java.util.regex.Pattern; 
import java.util.regex.Matcher; 
class Module1{ 
    public static void main(String[] asd){ 
    String sourcestring = "foo sentence A AND foo sentence B"; 
    Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE); 
    Matcher m = re.matcher(sourcestring); 
    if(m.find()){ 
     for(int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++){ 
     System.out.println("[" + groupIdx + "] = " + m.group(groupIdx)); 
     } 
    } 
    } 
} 

$matches Array: 
(
    [0] => foo sentence A AND foo sentence B 
    [1] => foo sentence A 
    [2] => AND 
    [3] => foo sentence B 
)

案例2，使用相同的正则表达式

import java.util.regex.Pattern; 
import java.util.regex.Matcher; 
class Module1{ 
    public static void main(String[] asd){ 
    String sourcestring = "foo sentence A"; 
    Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE); 
    Matcher m = re.matcher(sourcestring); 
    if(m.find()){ 
     for(int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++){ 
     System.out.println("[" + groupIdx + "] = " + m.group(groupIdx)); 
     } 
    } 
    } 
} 

$matches Array: 
(
    [0] => foo sentence A 
    [1] => foo sentence A 
)

来源

2013-05-26 04:20:43

谢谢，它的工作原理是一个字符串！我看到你用过（？：....）但没有清楚意思。有关于此的任何写得很好的教程？ –

（？：启动非捕获组，这使得在最后，使组可选，而在同一时间不要把匹配的文本中返回组 –

我也发现了这一点：？http://stackoverflow.com /问题/ 2973436 /正则表达式，前瞻，回顾后和原子团 –

按特定词分组的句子分组

回答

说明

组

的Java代码示例：

相关问题