0
我有抓住一个页面上的标题标签之间的数据的一些正则表达式代码:经典ASP正则表达式小的变化
<%
Function UrlExists(sURL)
Dim objXMLHTTP
Dim thePage
Dim strPTitle
Dim blnReturnVal
Dim objRegExp
Dim strTitleResponse
'Create object
Set objXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
on error resume next
'Get the head
objXMLHTTP.Open "HEAD", sURL, false
objXMLHTTP.setRequestHeader "User-Agent", Request.ServerVariables("HTTP_HOST")
objXMLHTTP.Send ""
'404?
If Err.Number <> 0 or objXMLHTTP.status <> 200 then blnReturnVal = "0|404 Error" Else blnReturnVal = "1|"
objXMLHTTP.close
'If not 404
if left(blnReturnVal,1) = "1" then
'Get the physical page
objXMLHTTP.Open "GET", sURL, false
objXMLHTTP.Send ""
thePage = objXMLHTTP.responseText
thePage = replace(thePage, vbCrlf, "")
objXMLHTTP.close
'Find title
Set objRegExp = New Regexp
objRegExp.IgnoreCase = true
objregexp.Multiline = true
objRegExp.Global = false
objRegExp.Pattern = "<title[^>]*?>(.*)</title>"
set strPTitle = objRegExp.Execute(thePage)
strTitleResponse = strPTitle.Item(0).Value
strTitleResponse = replace(strTitleResponse, vbCrlf, "")
strTitleResponse = trim(strTitleResponse)
if len(strTitleResponse) <1 OR strTitleResponse = "" then strTitleResponse = "(No Title)"
set objRegExp = nothing
strTitleResponse = replace(strTitleResponse,"</title>","")
strTitleResponse = replace(strTitleResponse,"<title>","")
strTitleResponse = replace(strTitleResponse,"'","' ")
blnReturnVal = blnReturnVal & strTitleResponse
end if
Set objXMLHTTP = nothing
UrlExists = blnReturnVal
End Function
%>
这工作得很好,并已为许多个月,但是当我写的(愚蠢?)我做了假设,每个页面只有一个或没有标题标签。它最近开始对John Lewis page抛出奇怪的错误,因为它在它的HTML两项冠军:
<title>John Lewis - Shop online at Britain's Favourite Retailer</title>
... bunch of html
<title>
</title>
如何修改正则表达式匹配只有第一配对,不感到困惑与上面的HTML?
很好,谢谢! – 2010-10-26 09:56:18
不客气:) – jensgram 2010-10-26 10:50:17