2016-09-06 14 views
0

我正在Oracle中修复一些文本。问题是我的数据中的句子有句子没有用空格分隔的词。例如:?Oracle regexp_replace - 为单独的句子添加空格

  1. 句无句space.Between

  2. 一句问号第二句

我已经测试了以下在regex101 REPLACE语句,它似乎工作在那里,但我不明白为什么它不能在Oracle中工作:

regexp_replace(review_text, '([^\s\.])([\.!\?]+)([^\s\.\d])', '\1\2 \3') 

这应该允许我查找分句时间段/感叹号/问号(单个或分组),并在句子之间添加必要的空格。我意识到还有其他的方式可以将句子分开,但我上面的内容应该涵盖大部分用例。第三个捕获组中的\ d是为了确保我不会意外更改诸如“4.5”到“4. 5”之类的数值。

测试组之前:

Sentence without space.Between sentences 
Sentence with space. Between sentences 
Sentence with multiple periods...Between sentences 
False positive sentence with 4.5 Liters 
Sentence with!Exclamation point 
Sentence with!Question mark 

后的变化应该是这样的:

Sentence without space. Between sentences 
Sentence with space. Between sentences 
Sentence with multiple periods... Between sentences 
False positive sentence with 4.5 Liters 
Sentence with! Exclamation point 
Sentence with! Question mark 

Regex101链接:https://regex101.com/r/dC9zT8/1

虽然所有变化工作从regex101预期,我的问题是,我进入Oracle的原因是我的第三个和第四个测试用例没有按预期工作。 Oracle不会在多个句点(省略号)组之后添加空格,而regexp_replace会为“4.5”添加空格。我不确定为什么会出现这种情况,但也许有一些关于我缺少的Oracle regexp_replace的特性。

任何和所有的见解是值得赞赏的。谢谢!

+0

我的猜测是它是在regex101中打开的全局匹配(g标志),而不是在Oracle中打开。 –

+0

全局发生是我没有想到的,但即使在Oracle中使用setting = 0时,我仍然遇到同样的问题。 – flamewheel

回答

2

这可能会让你开始。这将检查。?!以任意组合,接着是零个或多个空格和一个大写字母,并且它将用一个空格替换“零个或多个空格”。这不会分隔十进制数字;但它会错过以大写字母以外的任何字符开头的句子。您可以开始添加条件 - 如果遇到困难,请回复,我们会尽力提供帮助。参考其他正则表达式可能会有帮助,但它可能不是获得答案的最快方法。

with 
    inputs (str) as (
     select 'Sentence without space.Between sentences'   from dual union all 
     select 'Sentence with space. Between sentences'    from dual union all 
     select 'Sentence with multiple periods...Between sentences' from dual union all 
     select 'False positive sentence with 4.5 Liters'   from dual union all 
     select 'Sentence with!Exclamation point'     from dual union all 
     select 'Sentence with!Question mark'      from dual 
    ) 
select regexp_replace(str, '([.!?]+)\s*([A-Z])', '\1 \2') as new_str 
from inputs; 

NEW_STR 
------------------------------------------------------- 
Sentence without space. Between sentences 
Sentence with space. Between sentences 
Sentence with multiple periods... Between sentences 
False positive sentence with 4.5 Liters 
Sentence with! Exclamation point 
Sentence with! Question mark 

6 rows selected. 
+0

谢谢mathguy - 你写的是合乎逻辑的声音。我将应用你给出的内容(尽管我也会使用小写字母a-z)并检查是否缺少任何东西。 – flamewheel