-1
我试图将一个句子分成一组单词。我所看到的也是考虑数据分块时的度量。Java正则表达式将句子中的单词与值和其度量单词分隔为单个词
E.g (Made-up).
document= The root cause of the problem is the temperature, it is currently 40 degrees which is 30 percent likely to turn into an infection doctor has prescribed 1-19666 tablet which contains 1.67 gpm and has advised to consume them every 3 hrs.
什么是必需的,是一组单词
the
root
cause
problem
...
40 degrees
30 percent
1.67 gpm
1-19666 tablet
3 hrs
我已经试过的是
List<String> bagOfWords = new ArrayList<>();
String [] words = StringUtils.normalizeSpace(document.replaceAll("[^0-9a-zA-Z_.-]", " ")).split(" ");
for(String word :words){
bagOfWords.add(StringUtils.normalizeSpace(word.replaceAll("\\.(?!\\d)", " ")));
}
System.out.println("NEW 2 :: " + bagOfWords.toString());
你在寻找一个正则表达式,可以解决这个问题或寻找可以适用于任何句子的东西吗? –
任何句子。基本上就是用它的单位来拉出价值。 – Betafish