这里的另一个例子。这个解决方案可能适用于所有组合。
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class App {
public static void main(String[] args) {
String sentence = "In the preceding examples, classes derived from...";
List<String> list = splitWithPunctuation(sentence);
System.out.println(list);
}
public static List<String> splitWithPunctuation(String sentence) {
Pattern p = Pattern.compile("([^a-zA-Z\\d\\s]+)");
String[] split = sentence.split(" ");
List<String> list = new ArrayList<>();
for (String s : split) {
Matcher matcher = p.matcher(s);
boolean found = false;
int i = 0;
while (matcher.find()) {
found = true;
list.add(s.substring(i, matcher.start()));
list.add(s.substring(matcher.start(), matcher.end()));
i = matcher.end();
}
if (found) {
if (i < s.length())
list.add(s.substring(i, s.length()));
} else
list.add(s);
}
return list;
}
}
输出:
In
the
preceding
examples
,
classes
derived
from
...
更复杂的例子:
String sentence = "In the preced^^^in## examp!les, classes derived from...";
List<String> list = splitWithPunctuation(sentence);
System.out.println(list);
输出:
In
the
preced
^^^
in
##
examp
!
les
,
classes
derived
from
...
是他们必须在同一顺序的句子? – Yoda
应该将'......'分割为'''''''.'还是'''? – Pshemo
是''!''还是'!?'可能?如果是的话我们应该分割它吗? – Pshemo