如何在分隔符分割字符串，但排除其他字符串

我有这个字符串，我想拆就时期：如何在分隔符分割字符串，但排除其他字符串

j = 'you can get it cheaper than $20.99. shop at amazon.com. hurry before prices go up.'

这是结果，我想：

['you can get it cheaper than $20.99. ', 'shop at amazon.com.', ' hurry before prices go up.']

我在每个小写字母前面加上一个句点，后面跟着句号和空格。

x = [] 
sentences = re.split(r'([a-z]\.|\d\.\s)', j) 
sentence_endings = sentences[1::2] 
for position in range(len(sentences)): 
     if sentences[position] in sentence_endings: 
      x.append(sentences[position -1] + sentences[position])

打印X给我：

['you can get it cheaper than $20.99. ', 'shop at amazon.', 'com.', ' hurry before prices go up.']

我想“amazon.com”是一个字符串，所以我指示正则表达式忽略“.COM”与re.split(r'([a-z]\.|\d\.\s)[^.com]', j) 但不让我得到我想要的结果。什么是最好的方法来做到这一点？

来源

2016-01-05 Mika Schiller

非正则表达式的选择可能是使用nltk.sent_tokenize()：

>>> import nltk 
>>> j = 'you can get it cheaper than $20.99. shop at amazon.com. hurry before prices go up.' 
>>> nltk.sent_tokenize(j) 
['you can get it cheaper than $20.99.', 'shop at amazon.com.', 'hurry before prices go up.']

来源

2016-01-05 05:02:01 alecxe

一个简单的正则表达式上期后面有一个空格可能是\.\s分裂。

您可以使用一个回顾后保存在分裂时期：(?<=\.)\s

如果你想使用一个分裂的方法得到的只是“amazon.com”从你的字符串，你可以尝试.*(?=amazon.com)|(?<=amazon.com).*

来源

2016-01-05 05:03:29 Jota

're.split（r'（？<= \。）\ s'，s）' –

如何在分隔符分割字符串，但排除其他字符串

回答

相关问题