我想根据Python中的正常语法规则正确地分割一个语句。正则表达式使用Look Look或Look Look的正则表达式模式找到匹配的函数
我要拆分的一句话是
s = """Mr. Smith bought cheapsite.com for 1.5 million dollars,
i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a
probability of .9 it isn't."""
预期的输出是
Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.
对于很多搜索的我来到了以下的正则表达式这确实后实现这一目标,我使用定期, new_str是删除一些\ n从'''
m = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s',new_str)
for i in m:
print (i)
Mr. Smith bought cheapsite.com for 1.5 million dollars,i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with aprobability of .9 it isn't.
所以我的方式了解了reg当然是我们首先从第一选择选择
1)所有像即
2字符)从过滤空间,我们选择那些不字符 有像夫人先生的话等等
3)从筛选的第2步我们只选择那些我们有点或问题,并在前面有一个空格的主题。
于是,我就改变顺序如下
1)先过滤掉所有的冠军。
2)从经滤波的步骤中选择那些由空间
3)之前除去所有的短语等即
但是当我做的是,坯件之后也分裂
m = re.split(r'(?<![A-Z][a-z]\.)(?<=\.|\?)\s(?<!\w\.\w.)',new_str)
for i in m:
print (i)
Mr. Smith bought cheapsite.com for 1.5 million dollars,i.e.
he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with aprobability of .9 it isn't.
修改过的程序中最后一步不应该能够识别短语,例如为什么它没有检测到它?
您将使用nltk将文本拆分为句子,不可能在Python中编写精确的拆分正则表达式(您可以尝试一个匹配的正则表达式,但这将是一个挑战)。 –
@WiktorStribiżew我同意,但在这种情况下,我想了解正则表达式的细微差别,以及为什么改变inorder会产生不正确的结果 –
你想说'new_str'中的输入已经用像[这里](https://regex101.com/r/zEAkas/1)? –