我想创建一个正则表达式相当不成功,我正在做的是获取任何html元素的内容(作者| byline |作家)复杂的正则表达式来提取python作者名称
这里是我迄今为止
<([A-Z][A-Z0-9]*)class=\"(byLineTag|byline|author|by)\"[^>]*>(.*?)</\1>
什么,我需要匹配
<h6 class="byline">By <a rel="author" href="http://topics.nytimes.com/top/reference/timestopics/people/e/jack_ewing/index.html?inline=nyt-per" title="More Articles by Jack Ewing" class="meta-per">JACK EWING</a> and <a rel="author" href="http://topics.nytimes.com/top/reference/timestopics/people/t/landon_jr_thomas/index.html?inline=nyt-per" title="More Articles by Landon Thomas Jr." class="meta-per">LANDON THOMAS Jr.</a></h6>
或
例子210<div class="noindex"><span class="by">By </span><span class="byline"><a href="javascript:NewWindow(575,480,'/apps/pbcs.dll/personalia?ID=sshemkus',0)" title="Email Reporter">Sarah Shemkus</a></span></div>
任何帮助将不胜感激。 -Stefan
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – MitMaro
不要这样做。请参阅MitMaro的链接。想象一下像'
您可以发布一些示例输入和预期的输出。 – Stephan