正则表达式非ASCII字符

我在Python中正则表达式有点麻烦。 HTML字符串是：正则表达式非ASCII字符

html = <td style="padding-right:5px;"> 
<span class="blackText">Above £ 7.00 = </span> 
</td> 
<td> 
<span class="blackText"> 
<p>Free</p> 
</span> 
</td>

我想提取的 “7.00” 和 “自由”，但下面不工作：

量= re.findall（R”以上£（。*？）='，html）

Python为£符号抛出一个非ASCII错误。我将如何解决这个问题？谢谢。

来源

2012-11-29 user578582

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – lolopop

amount = re.findall(r'Above \xC2 (.*?) =', html)

来源

2012-11-29 18:50:14

你怎么'\ xC2'？我的Python似乎使用'\ xa3'作为英镑符号。 – chrisaycock

@chrisaycock - 取决于编码。 '\ xa3'是html实体。 '\ xC2'是utf-8。请参阅（http://www.fileformat.info/info/unicode/char/a3/index.htm） –

@JayWalker Ahhh – chrisaycock

正则表达式非ASCII字符

回答

相关问题