使用php以更快的方式获取内容使用php

我正在使用php，我想以更快的方式从url获取内容。
这是我使用的代码。
代码：（1）使用php以更快的方式获取内容使用php

<?php 
    $content = file_get_contents('http://www.filehippo.com'); 
    echo $content; 
?>

这是很多其他方法来读取文件，如fopen()，readfile()等，但我认为file_get_contents()比这些方法快。

在我上面的代码中，当你执行它时，你会发现它从本网站的所有东西甚至图像和广告。我只想得到计划HTML文本没有CSS样式，图像和广告。我怎样才能得到这个。
看到这个了解。
CODE：（2）

<?php 
    $content = file_get_contents('http://www.filehippo.com'); 
    // do something to remove css-style, images and ads. 
    // return the plain html text in $mod_content. 
    echo $mod_content; 
?>

如果我是这样做上述然后我会在错误的方式，因为我已经得到变量$content的全部内容，然后修改它。
这里可以是任何函数方法或其他任何从url直接获取纯文本html文本的方法。

下面的代码只是为了理解而写的，这不是原来的php代码。
IDEAL CODE：（3）;

<?php 
    $plain_content = get_plain_html('http://www.filehippo.com'); 
    echo $plain_content; // no css-style, images and ads. 
?>

如果我能得到这个功能，它会比别人快得多。这可能吗？
谢谢。

来源

2013-05-27 Axeem

页面'HTTP：// www.filehippo.com'嵌入了已经脚本和样式。你不能选择不下载它，但你可以过滤它。 –

试试这个。

$content = file_get_contents('http://www.filehippo.com'); 
$this->html = $content; 
$this->process(); 
function process(){ 

    // header 
    $this->_replace('/.*<head>/ism', "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE html PUBLIC '-//WAPFORUM//DTD XHTML Mobile 1.0//EN' 'http://www.wapforum.org/DTD/xhtml-mobile10.dtd'><html xmlns='http://www.w3.org/1999/xhtml'><head>"); 

    // title 
    $this->_replace('/<head>.*?(<title>.*<\/title>).*?<\/head>/ism', '<head>$1</head>'); 

    // strip out divs with little content 
    $this->_stripContentlessDivs(); 

    // divs/p 
    $this->_replace('/<div[^>]*>/ism', '') ; 
    $this->_replace('/<\/div>/ism','<br/><br/>'); 
    $this->_replace('/<p[^>]*>/ism',''); 
    $this->_replace('/<\/p>/ism', '<br/>') ; 

    // h tags 
    $this->_replace('/<h[1-5][^>]*>(.*?)<\/h[1-5]>/ism', '<br/><b>$1</b><br/><br/>') ; 


    // remove align/height/width/style/rel/id/class tags 
    $this->_replace('/\salign=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\sheight=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\swidth=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\sstyle=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\srel=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\sid=(\'?\"?).*?\\1/ism',''); 
    $this->_replace('/\sclass=(\'?\"?).*?\\1/ism',''); 

    // remove coments 
    $this->_replace('/<\!--.*?-->/ism',''); 

    // remove script/style 
    $this->_replace('/<script[^>]*>.*?\/script>/ism',''); 
    $this->_replace('/<style[^>]*>.*?\/style>/ism',''); 

    // multiple \n 
    $this->_replace('/\n{2,}/ism',''); 

    // remove multiple <br/> 
    $this->_replace('/(<br\s?\/?>){2}/ism','<br/>'); 
    $this->_replace('/(<br\s?\/?>\s*){3,}/ism','<br/><br/>'); 

    //tables 
    $this->_replace('/<table[^>]*>/ism', ''); 
    $this->_replace('/<\/table>/ism', '<br/>'); 
    $this->_replace('/<(tr|td|th)[^>]*>/ism', ''); 
    $this->_replace('/<\/(tr|td|th)[^>]*>/ism', '<br/>'); 

    // wrap and close 

} 
private function _replace($pattern, $replacement, $limit=-1){ 
    $this->html = preg_replace($pattern, $replacement, $this->html, $limit); 
}

来源

2013-05-27 05:34:36

无需使用$ this，当它是简单的代码片段就可以在课堂外使用。或至少将其转换为示例类，以便无经验的复制粘贴不会出错。 –

这就是为什么我只在代码下面添加详细信息链接。 –

，您可以使用正则表达式来删除CSS脚本的标签和图像的标签，只需用空格

preg_replace($pattern, $replacement, $string);

替代那些代码更详细的功能去这里：http://php.net/manual/en/function.preg-replace.php

来源

2013-05-27 05:01:06

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 –

** jaD **你问我喜欢** code（2 ）**请看我的问题。这是为什么这不好。谢谢。 – Axeem

@ user2280065，从http://www.filehippo.com你不能选择得到什么或不能。每当您发送请求获取http://www.filehippo.com页面时，它都会每次发送整个页面。你可以做的就像缓存。保存最常用的页面。 –

使用php以更快的方式获取内容使用php

回答

相关问题