匹配表单上的名字名字姓氏与国际字符

我试图通过使他们的形式Firstname Lastlame表明他们的名字。这适用于下面的代码，但我希望能够捕获像Pär Åberg这样的国际名称。我找到了一些解决方案，但他们确实不似乎不适用于Python风格的正则表达式。任何人都有这样的感觉？匹配表单上的名字名字姓氏与国际字符

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
import re 

text = """ 
This is a text containing names of people in the text such as 
Hillary Clinton or Barack Obama. My problem is with names that uses stuff 
outside A-Z like Swedish names such as Pär Åberg.""" 

for name in re.findall("(([A-Z])[\w-]*(\s+[A-Z][\w-]*)+)", text): 
    firstname = name[0].split()[0] 
    print firstname

来源

2015-11-16 cowboyvspirate

小心带捕获组和findall。 –

对于姓氏，您可以搜索空格之间的任何字符 – Onilol

尝试're.findall（r'[AZ] [\ w - ] *（？：\ s + [AZ] [\ w - ] *）+'）' –

你需要一个替代regex library，因为你可以使用\p{L} - 任何Unicode字母。

然后，使用

ur'\p{Lu}[\w-]*(?:\s+\p{Lu}[\w-]*)+'

当使用Unicode字符串来初始化正则表达式，所述UNICODE标志自动使用：

如果不指定ASCII，LOCALE也不UNICODE标志时，它会如果正则表达式模式是Unicode字符串，则缺省为UNICODE;如果是字符串，则缺省为ASCII。

来源

2015-11-16 17:45:28

工作就像一个魅力！除了用正则表达式更新之外，我只需要将'firstname = name [0] .split（）[0]'编辑为'firstname = name.split（）[0]'。 – cowboyvspirate

匹配表单上的名字名字姓氏与国际字符

回答

相关问题