Python中的`re.split（）`奇怪地工作

在python中有一点困难。我想带一个带有许多评论的.txt文件并将它分成一个列表。但是，我想分割所有标点符号，空格和\ n。当我运行下面的python代码时，它将我的文本文件分裂成多个奇怪的点。注意：下面我只是试图在期间和期限上进行分割来测试它。但它仍然经常用文字去掉最后一封信。Python中的`re.split（）`奇怪地工作

import regex as re 
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile: 
    nf = infile.read() 
    wList = re.split('. | \n, nf) 

print(wList)

来源

2017-07-21 John W

您忘记了正则表达式字符串的结束语。 –

看看这篇文章是否有帮助https://stackoverflow.com/questions/4998629/python-split-string-with-multiple-delimiters – Jake

我不知道为什么它在这段代码中做到了这一点，我把它放在我的ipynb文件中 –

你需要修复的引号，并以正则表达式的微小变化：

import regex as re 
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile: 
    nf = infile.read() 
    wList = re.split('\W+' nf) 

print(wList)

来源

2017-07-21 19:10:04 Ajax1234

这很有帮助，但是您是否知道一个网站会告诉我转义序列如何在.split（）函数中起作用？我想因为我试图去除标点符号和特殊字符，并且我没有正确描述它们。 –

@JohnW转义字符将允许以下字符在表达式中自行匹配。否则，角色具有特殊意义。关于split函数，传递给它的表达式对于所有的re方法都是一样的。有关转义字符的更多信息，请参阅此处：http://www.regular-expressions.info/characters.html – Ajax1234

你忘了关串在你面前需要\。

import regex as re 
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile: 
    nf = infile.read() 
    wList = re.split('\. |\n |\s', nf) 

print(wList)

有关更多信息，请参阅Split Strings with Multiple Delimiters?。

此外，RichieHindle回答你的问题很好：

import re 
DATA = "Hey, you - what are you doing here!?" 
print re.findall(r"[\w']+", DATA) 
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

来源

2017-07-21 19:15:43 Jake

谢谢！我会试试看。看看为什么Python解释器做它有时会做的事情真的很有用 –

是的，就像python一样直观，它有时可能会很棘手，希望所有事情都能为你效劳！ – Jake

在正则表达式，字符.手段任何字符。你必须逃避它，\.，以捕捉时期。

来源

2017-07-21 19:16:37

谢谢！将试验这个！ –

Python中的`re.split（）`奇怪地工作

回答

相关问题