基于正则表达式

Python的拆分标签，我想如下拆分以下标记<b size=5 alt=ref>：基于正则表达式

Open tag: b 
Parm: size=5 
Parm: alt=ref

不过，我尝试下面的代码，以分割标签的群体，但它没有工作：

temp = '<b size=5 alt=ref>' 
matchObj = re.search(r"(\S*)\s*(\S*)", temp) 
print 'Open tag: ' + matchObj.groups()

我的计划是将标签拆分成组，然后打印第一组作为开放标签，其余为Parm。你能提出一些有助于我解决这个问题的想法吗？

请注意，我从html文件中读取标签，但我在这里提到了一个打开标签的示例，并且展示了我坚持使用的部分代码。

感谢

来源

2015-10-14 Nasser

有没有使用HTML解析器理由吗？ –

如果[搜索]（https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1GTPM_enUS601US601&ion=1&espv=2&ie=UTF-8#q=c%2B%2B%20parse%20xml%20using%20regex ）你会发现[许多人不鼓励]（https://stackoverflow.com/questions/4122624/would-you-implement-a-lightweight-xml-parser-with-regex）试图解析XML/HTML /等使用正则表达式，因为已经有更强大的方法可以做到这一点。 – CoryKramer

tag_names = ["Open tag:","Parm:","Parm:"] 
import re 
# split on <,>,white space, and remove empty strings at 
# the start and at the end of the resulting list. 
tags = re.split(r'[<> ]','<b size=5 alt=ref>')[1:-1] 
# zip tag_names list and with list of tags 
print(list(zip(tag_names, tags))) 

[('Open tag:', 'b'), ('Parm:', 'size=5'), ('Parm:', 'alt=ref')]

来源

2015-10-14 15:59:18 LetzerWille

虽然这个答案可能是正确的，但请添加一些解释。赋予基础逻辑比赋予代码更重要，因为它可以帮助OP和其他读者自己解决这个问题和类似的问题。 – CodeMouse92

>>> import re 
>>> temp = '<b size=5 alt=ref>' 
>>> resList = re.findall("\S+", temp.replace("<","").replace(">","")) 
>>> myDict = {} 
>>> myDict["Open tag:"] = [resList[0]] 
>>> myDict["Parm:"] = resList[1:] 
>>> myDict 
{'Open tag:': ['b'], 'Parm:': ['size=5', 'alt=ref']}

来源

2015-10-14 17:28:49

基于正则表达式

回答

相关问题