python正则表达式替换unicode

在第一个测试字符串中，我试图用空格替换文本中间的Unicode右箭头字符，但它似乎没有工作。python正则表达式替换unicode

一般情况下，我想删除所有单个字符或多个Unicode“无字”，但保持的话，如果他们是-Z0-9和Unicode或混合物只\ W

# -*- coding: utf-8 -*- 
import re 
str = 'hi… » Test' 
str = 're of… » Pr' 
str = 're of… » Pr | removepipeaswell' 
print str 
str = re.sub(r' [^a-z0-9]+ ', ' ', str , re.UNICODE|re.MULTILINE) 
# str = re.sub(r' [^\p{Alpha}] ', ' ', str, re.UNICODE) 
print str 
're of… Pr removepipeaswell' #expected output 

str_nbsp = 'afds » asf'

编辑：增加了另一个测试字符串，我不想删除“...”（unicode点），我想删除多个unicode（非字）字符。

编辑：使用本作品为测试用例，（但不是完整的HTML ??? - 它似乎只替换匹配到上半年的字符串，然后忽略其余部分。）

str = re.sub(r' [^a-z0-9]+ ', ' ', str , re.UNICODE|re.MULTILINE)

编辑：卧槽，它必须像不正确读取参数列表一些愚蠢的事：http://bytes.com/topic/python/answers/689341-sub-does-not-replace-all-occurences

[谁刚刚删除他们的反应 - 感谢你的帮助。]

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

最后的测试字符串“str_nbsp”与上面的正则表达式不匹配。其中一个空格字符实际上是一个非破坏性的空格字符。我使用了www.regexr.com，并在每个角色上盘踞，以解决这个问题。

来源

2014-04-17 Dave

只是让你知道[Stack Overflow Regular Expressions FAQ]（http://stackoverflow.com/a/22944075/2736496）。 :) – aliteralmind

谢谢。我是一个perl中的正则表达式，但我是python的新手。仍然习惯于不同的语法。 – Dave

如果您还不知道，Debuggex.com是一个同时具有Python和PCRE的在线测试工具。 – aliteralmind

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

来源

2014-04-17 01:11:58 Dave

python正则表达式替换unicode

回答

相关问题