关于Python中使用正则表达式:Python网络刮
pathstring = '<span class="titletext">(.*)</span>'
pathFinderTitle = re.compile(pathstring)
我的输出是:
Govt has nothing to do with former CAG official RP Singh:
Sibal</span></a></h2></div><div class="esc-lead-article-source-wrapper">
<table class="al-attribution single-line-height" cellspacing="0" cellpadding="0">
<tbody><tr><td class="al-attribution-cell source-cell">
<span class='al-attribution-source'>Times of India</span></td>
<td class="al-attribution-cell timestamp-cell">
<span class='dash-separator'> - </span>
<span class='al-attribution-timestamp'>‎46 minutes ago‎
文本找到应该在第一个“</SPAN已经停止> ”。
请提出这里有什么问题。
http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html –