如何提取页面标题

我试图从一个HTML页面如何提取页面标题

cat index.html | grep -i "title>"| sed 's/<title>/ /i'| sed 's/<\/title>/ /i'

的问题发生时，一些网页都写在一行中提取的页面标题！（相信我吧）

我该如何解决？

谢谢！

来源

2010-07-07 Zenet

sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'

从Linux Commands。

Google的第一个结果：unix extract page title。

来源

2010-07-07 14:48:59 mcandre

非常感谢！ – Zenet 2010-07-07 15:02:00

此awk单线程也适用于超过1行的标题。

$ cat file 
<html> 
    <title>How to extract a page 
title - Stack Overflow</title> 
    <link rel="stylesheet" href="http://sstatic.net/so/all.css?v=4864b39b46cf"> 
    <link rel="shortcut icon" href="http://sstatic.net/so/favicon.ico"> 
    <link rel="apple-touch-icon" href="http://sstatic.net/so/apple-touch-icon.png"> 
</html> 

$ awk 'BEGIN{RS="</title>"}/title/{gsub(".*<title>","");print}' file 
How to extract a page 
title - Stack Overflow

来源

2010-07-07 15:43:32 ghostdog74

如何提取页面标题

回答

相关问题