如何匹配Python原始字符串中的新行字符

我对Python原始字符串有点困惑。我知道，如果我们使用原始字符串，那么它会将'\'视为正常反斜杠（例如r'\ n'将是'\'和'n'）。但是，我想知道如果我想匹配原始字符串中的新行字符。我试过r'\ n'，但没有奏效。有人对此有一些好的想法吗？如何匹配Python原始字符串中的新行字符

来源

2013-02-04 wei

，我们在谈论什么样的比赛这里？你是在谈论一个正则表达式匹配，或者只是一个'if ... in my_raw_string'？ – mgilson

很抱歉让您困惑。我正在谈论一个正则表达式。 – wei

在正则表达式，你需要指定你在多行模式是：

>>> import re 
>>> s = """cat 
... dog""" 
>>> 
>>> re.match(r'cat\ndog',s,re.M) 
<_sre.SRE_Match object at 0xcb7c8>

注意re平移\n（原始字符串）转换成换行符。正如你在你的评论所指出的，你实际上并不需要re.M它来搭配，但它确实有更直观的匹配$和^帮助：

>> re.match(r'^cat\ndog',s).group(0) 
'cat\ndog' 
>>> re.match(r'^cat$\ndog',s).group(0) #doesn't match 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
AttributeError: 'NoneType' object has no attribute 'group' 
>>> re.match(r'^cat$\ndog',s,re.M).group(0) #matches. 
'cat\ndog'

来源

2013-02-04 15:22:51 mgilson

感谢您的回答@mgilson！我也想知道为什么我们需要指定多行模式。我尝试过没有它的匹配，就像这个“re.match（r'cat \ ndog'，s）”，它仍然有效。 – wei

@ user1783403 - 你说的没错。我应该更多地阅读文档。指定're.M'获得'^'和'$'以更直观地匹配。 – mgilson

可以通过任何方式让'$'匹配“不那么直观” - 即匹配*只在字符串的末尾？我不希望它之前'\ N' –

最简单的答案就是不使用原始字符串。您可以使用\\来避免反斜杠。

如果你有反斜杠的庞大的数字在某些领域，那么你可以串联原始字符串和正常的字符串作为需要：

r"some string \ with \ backslashes" "\n"

（Python的自动串接字符串常量，它们之间仅有空格。）

记住，如果你是在Windows上的路径工作，最简单的选择是仅使用正斜杠 - 它仍然会正常工作。

来源

2013-02-04 15:06:24

@mgilson我只是检查它与原始字符串和普通字符串一起工作，因为它不是我所做的。像编辑一样。实际上它会更好一些，因为我相信这个连接是在解析时完成的，而不是在执行时。 –

是啊，我从来没有真正前，现在要么:) – mgilson

检查为什么-1对此有何看法？ –

def clean_with_puncutation(text):  
    from string import punctuation 
    import re 
    punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation} 
    punctuation_token['<br/>']="<TOKEN_BL>" 
    punctuation_token['\n']="<TOKEN_NL>" 
    punctuation_token['<EOF>']='<TOKEN_EOF>' 
    punctuation_token['<SOF>']='<TOKEN_SOF>' 
    #punctuation_token 



    regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\(\)\[\]\ 
      {\}\;\:\,\.\/\?\|\`\_\\+\\\=\~\-\<\>]" 

###Always put new sequence token at front to avoid overlapping results 
#text = '<EOF>[email protected]#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ ' 
    text_="" 

    matches = re.finditer(regex, text) 

    index=0 

    for match in matches: 
    #print(match.group()) 
    #print(punctuation_token[match.group()]) 
    #print ("Match at index: %s, %s" % (match.start(), match.end())) 
     text_=text_+ text[index:match.start()] +" " 
       +punctuation_token[match.group()]+ " " 
     index=match.end() 
    return text_

来源

2017-12-15 16:09:22

如何匹配Python原始字符串中的新行字符

回答

相关问题