我想要做的是:从http://reddit.com/r/worldnews的顶部帖子中获取文本标题,并将其输出到我的网页上,该页面上只有该文本。如何获取网站的特定部分或div
最后,我想从该网页抓取使用AppleScript cURL并输出的文本。
我正在制作一个脚本,当我点击按钮它会告诉我最高的职位。
编辑如果你能想到任何方式,我想做同样的事情,但对Facebook的通知。
编辑我有PHP抓取网站并输出这里:http://colejohnsoncreative.com/personal/ai/worldnews.php这是我使用的代码:
<?php
// Get a file into an array. In this example we'll go through HTTP to get
// the HTML source of a URL.
$lines = file('http://www.reddit.com/r/worldnews');
// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
echo "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";
}
// Another example, let's get a web page into a string. See also file_get_contents().
$html = implode('', file('http://www.example.com/'));
// Using the optional flags parameter since PHP 5
$trimmed = file('somefile.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
?>
所以我得到的所有网站的代码输出的,但我所需要的该项目是
<a class="title " href="http://www.dailymail.co.uk/news/article-2219477/Cannabis-factory-couple-gave-400-000-drug-dealing-fortune-poor-Kenyans-jailed-years.html" >British couple who spent most of the money they made from canabis growing on paying for life changing operations and schooling for people in a poor Kenyan village gets sent to prison for 3 years.</a>
和其他所有我需要扔掉,我该怎么做?
看看SCRAPPING方法http://stackoverflow.com/questions/26947/how-to-implement-a-web-scraper-in-php – Steven