如何获取使用PHP的RSS网站的RSS网址？

3

这是比在这里粘贴一些代码更多的参与。但我可以指出你需要做的事情的正确方向。

首先，你需要抓取网页
解析你回来找RSS Autodiscovery Meta tag的字符串。您可以将整个文档映射为XML并使用DOM遍历，但我只是使用正则表达式。
提取标签的href部分，并且您现在拥有RSS提要的URL。

来源

2011-08-06 16:28:59 DOOManiac

+0

嗨，你是否提及有关html源报废以确定RSS饲料网址？ – Jeyaganesh

1

的rules for making RSS discoverable被相当详细的记载。您只需要解析HTML并查找所描述的元素。

来源

2011-08-06 16:24:05 Quentin

13

的一般过程已经回答（Quentin，DOOManiac），所以一些代码（Demo）：

<?php 

$location = 'http://hakre.wordpress.com/'; 
$html = file_get_contents($location); 
echo getRSSLocation($html, $location); # http://hakre.wordpress.com/feed/ 

/** 
* @link http://keithdevens.com/weblog/archive/2002/Jun/03/RSSAuto-DiscoveryPHP 
*/ 
function getRSSLocation($html, $location){ 
    if(!$html or !$location){ 
     return false; 
    }else{ 
     #search through the HTML, save all <link> tags 
     # and store each link's attributes in an associative array 
     preg_match_all('/<link\s+(.*?)\s*\/?>/si', $html, $matches); 
     $links = $matches[1]; 
     $final_links = array(); 
     $link_count = count($links); 
     for($n=0; $n<$link_count; $n++){ 
      $attributes = preg_split('/\s+/s', $links[$n]); 
      foreach($attributes as $attribute){ 
       $att = preg_split('/\s*=\s*/s', $attribute, 2); 
       if(isset($att[1])){ 
        $att[1] = preg_replace('/([\'"]?)(.*)\1/', '$2', $att[1]); 
        $final_link[strtolower($att[0])] = $att[1]; 
       } 
      } 
      $final_links[$n] = $final_link; 
     } 
     #now figure out which one points to the RSS file 
     for($n=0; $n<$link_count; $n++){ 
      if(strtolower($final_links[$n]['rel']) == 'alternate'){ 
       if(strtolower($final_links[$n]['type']) == 'application/rss+xml'){ 
        $href = $final_links[$n]['href']; 
       } 
       if(!$href and strtolower($final_links[$n]['type']) == 'text/xml'){ 
        #kludge to make the first version of this still work 
        $href = $final_links[$n]['href']; 
       } 
       if($href){ 
        if(strstr($href, "http://") !== false){ #if it's absolute 
         $full_url = $href; 
        }else{ #otherwise, 'absolutize' it 
         $url_parts = parse_url($location); 
         #only made it work for http:// links. Any problem with this? 
         $full_url = "http://$url_parts[host]"; 
         if(isset($url_parts['port'])){ 
          $full_url .= ":$url_parts[port]"; 
         } 
         if($href{0} != '/'){ #it's a relative link on the domain 
          $full_url .= dirname($url_parts['path']); 
          if(substr($full_url, -1) != '/'){ 
           #if the last character isn't a '/', add it 
           $full_url .= '/'; 
          } 
         } 
         $full_url .= $href; 
        } 
        return $full_url; 
       } 
      } 
     } 
     return false; 
    } 
}

参见：RSS auto-discovery with PHP (archived copy)。

来源

2011-08-06 17:22:32 hakre

+0

优秀！它对我来说工作得很好 – fortytwo

1

一个稍微小一点的函数，将抓取第一个可用的feed，不管它是rss还是atom（大多数博客有两个选项 - 这抓住了第一选择）。

public function getFeedUrl($url){ 
     if(@file_get_contents($url)){ 
      preg_match_all('/<link\srel\=\"alternate\"\stype\=\"application\/(?:rss|atom)\+xml\"\stitle\=\".*href\=\"(.*)\"\s\/\>/', file_get_contents($url), $matches); 
      return $matches[1][0]; 
     } 
     return false; 
    }

来源

2013-09-27 16:50:08 Jonathan

如何获取使用PHP的RSS网站的RSS网址？

回答

相关问题