使用Python正则表达式来从鸣叫retweeters有鉴于新浪微博的鸣叫中国文字

：使用Python正则表达式来从鸣叫retweeters有鉴于新浪微博的鸣叫中国文字

tweet = "//@lilei: dd //@Bob: cc//@Girl: dd//@魏武: 利益所致 自然念念不忘// @诺什: 吸引优质 客户，摆脱屌丝男！！！//@MarkGreene: 转发微博"

注意，有//之间的空间@诺什。

我想retweeters的列表，像这样：

result = ['lilei', 'Bob', 'Girl', '魏武', 'MarkGreene']

我一直在考虑使用以下脚本：

RTpattern = r'''//[email protected](\w+)''' 
rt = re.findall(RTpattern, tweet)

但是，我未能获得中国字“魏武”。

来源

2013-03-31 Frank Wang

使用re.UNICODE标志：

re.UNICODE 
Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character 
properties database.

tweet = u"//@lilei: dd //@Bob: cc//@Girl: dd//@魏武: 利益所致 自然念念不忘// @诺什: 吸引优质 客户，摆脱屌丝男！！！//@MarkGreene: 转发微博" 
RTpattern = r'''//[email protected](\w+)''' 
for word in re.findall(RTpattern, tweet, re.UNICODE): 
    print word 

# lilei 
# Bob 
# Girl 
# 魏武 
# MarkGreene

来源

2013-03-31 07:59:13 root

谢谢。我得到['lilei'，'Boy'，'Girl'，''xe9'，'MarkGreene']，而不是['lilei'，'Bob'，'Girl'，'魏武'，'MarkGreene'] –

你必须使推特成为'unicode'字符串（注意'u'）。要做到这一点，只需添加'tweet = tweet.decode（'utf-8'）' – root

使用Python正则表达式来从鸣叫retweeters有鉴于新浪微博的鸣叫中国文字

回答

相关问题