Python的正则表达式解析

我在Python中的字符串，其阵列中的每个字符串看起来像这样的数组：Python的正则表达式解析

<r n="Foo Bar" t="5" s="10" l="25"/>

我一直在寻找了一段时间，我能找到的最好的事情是试图将HTML超链接正则表达式修改为适合我需要的东西。

但真的不知道很多正则表达式的东西，我还没有任何工作。这是我迄今为止所拥有的。

string = '<r n="Foo Bar" t="5" s="10" l="25"/>' 
print re.split("<r\s+n=(?:\"(^\"]+)\").*?/>", string)

从该字符串中提取n，t，s和l值的最佳方法是什么？

来源

2009-05-02 AdamB

这将让你最那里的方式：

>>> print re.findall(r'(\w+)="(.*?)"', string) 
[('n', 'Foo Bar'), ('t', '5'), ('s', '10'), ('l', '25')]

re.split和re.findall是互补的。

每当您的思考过程以“我希望每个项目看起来像X”开始，那么您应该使用re.findall。当它以“我需要X和周围的数据”开始时，请使用re.split。

来源

2009-05-02 12:34:08 Clint

完美地工作，谢谢。 – AdamB 2009-05-02 12:36:25

<r n="Foo Bar" t="5" s="10" l="25"/>

该源看起来像XML，因此，“最好的办法”是使用的XML解析模块。如果它不完全XML，BeautifulSoup（或者说，BeautifulSoup.BeautifulStoneSoup模块）可效果最好，因为它善于应对可能的，无效的XML（或事物“都没有相当 XML”）：

>>> from BeautifulSoup import BeautifulStoneSoup 
>>> soup = BeautifulStoneSoup("""<r n="Foo Bar" t="5" s="10" l="25"/>""") 

# grab the "r" element (You could also use soup.findAll("r") if there are multiple 
>>> soup.find("r") 
<r n="Foo Bar" t="5" s="10" l="25"></r> 

# get a specific attribute 
>>> soup.find("r")['n'] 
u'Foo Bar' 
>>> soup.find("r")['t'] 
u'5' 

# Get all attributes, or turn them into a regular dictionary 
>>> soup.find("r").attrs 
[(u'n', u'Foo Bar'), (u't', u'5'), (u's', u'10'), (u'l', u'25')] 
>>> dict(soup.find("r").attrs) 
{u's': u'10', u'l': u'25', u't': u'5', u'n': u'Foo Bar'}

来源

2009-05-02 13:32:54 dbr

回答

相关问题