网络抓取删除链接没有ID /类附加为PHP

-1

您好我使用网络抓取网站，但它的内容太多，我不需要的信息。这里是我的代码：网络抓取删除链接没有ID /类附加为PHP

<?php 
require('phpQuery.php'); 
$url = 'http://www.nasdaq.com/screening/companies-by-name.aspx?letter=A'; 
$html = file_get_contents($url); 
$pq = phpQuery::newDocumentHTML($html); 
echo $pq['#CompanylistResults']; 
?>

，其结果是：

<table id="CompanylistResults"> 
<tbody> 
<tr> 
<tr> 
<td> 
<a target="_blank" rel="nofollow" href="http://www.1800flowers.com">1-800 FLOWERS.COM, Inc.</a> 
</td> 
<td> 
<td style="">$100.55M</td> 
<td style="display:none"></td> 
<td>United States</td> 
<td>1999</td> 
<td style="width:105px">Other Specialty Stores</td>

我需要的是 “1-800 FLOWERS.COM公司”和“$ 100.55M”中的文字来说，该怎么做呢？

来源

2012-08-23 Priscilla Ip

这种财务信息可从几十个的API就没有必要凑。在您显示的页面上有一个链接：“下载此列表”，它提供了一个csv文件 – 2012-08-23 20:18:44

dozen api's ???实际上，我希望使用这两个文本创建链接并在网站中显示 –

尝试使用此代码：

//the url you need to scrape 
$uri = ('http://www.nasdaq.com/screening/companies-by-name.aspx?letter=A'); 
//extracts HTML from the url 
$get = file_get_contents($uri); 

//Finding what you want removed 
$pos1 = strpos($get, "<a target=\"_blank\" rel=\"nofollow\" href=\"http://www.1800flowers.com\">"); 
$pos2 = strpos($get, "</a>", $pos1); 

$pos3 = strpos($get, "<td style=\"\">"); 
$pos4 = strpos($get, "</td>", $pos3); 

//Removing the parts that are not needed 
$text = substr($get,$pos1,$pos2-$pos1); 
$test3 = substr($get,$pos3,$pos4-$pos3); 

//Removing tags from is left after the above code, you should now have only the values that you are looking for 
$text1 = strip_tags($text); 
$text2 = strip tags($text3);

来源

2017-03-23 12:53:57 Stefano

您需要更好地解释您的代码片断正在执行什么操作，以便我们都能理解而不会混淆如何回答提出的问题。 – Mike

正确...在$ uri中放置需要刮取的url，$ get – Stefano

正确...在$ uri中放置需要刮取的url，$ get（file_get_contents）从url中提取html，whit $ pos1和$ pos2 u修复从哪里到哪里thake数据（相同的$ pos3和$ pos4），用$ text获取$ pos1和$ pos2之间的代码（与$ pos3和$ pos4之间的$ text3相同）。用strip_tags（）你可以得到值。 – Stefano

网络抓取删除链接没有ID /类附加为PHP

回答

相关问题