2011-02-24 58 views
1

我想写一个刮刮脸的应用程序,我遇到了问题。我的PHP Curl代码不会以书籍的价格拉动页面。它将我返回到域的Web根目录。刮书价格

我在努力按ISBN搜索网站。

我一直在撞墙撞墙。任何帮助将不胜感激!

代码:

<form method="post" for="new-search" name="SearchTerm" class='form-validate' id="SearchTerm" action="index.php"> 
    <textarea rows="3" name="SearchTerm" id="SearchTerm" cols="40" class="validate-required error"></textarea><div class="error" id="SearchTerm-error"> 
    <br>       
    <button class="search primary" type="submit">continue</button> 

</form> 


<?php 

/* 
echo("<pre>");print_r($_GET);echo("</pre>"); 
echo("<pre>");print_r($_POST);echo("</pre>"); 
*/ 

$isbn = $_POST['SearchTerm']; 


$userAgent = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16'; 

$fields = array(
    'url' => ("http://www.bookleberry.com/Search/SearchKeyword"), 
    'qurl' => ("http://www.bookleberry.com/Search/SearchKeyword/" . $_POST['SearchTerm']), 
    'SearchTerm' => ($_POST['SearchTerm']), 
    'Page' => ('1'), 
    'class' => ('textfield validate-required'), 
    'for' => ('new-search'), 
    'result-count' => ('1'), 
    'status' => 'success', 
); 

$SearchTerm = ($fields['SearchTerm']); 
$url = ($fields['url']); 
$Page = ($fields['Page']); 


echo("<pre>"); 
print_r($fields); 
echo("</pre>"); 

if ($isbn != NULL){ 

    //open connection 
    $ch = curl_init($url); 
    //set the url, number of POST vars, POST data 
    curl_setopt($ch, CURLOPT_HEADER, $userAgent); 
    curl_setopt($ch, CURLOPT_URL, $url); 
     echo "before curl_exec:<br>"; 
     echo "curl_errno=". curl_errno($ch) ."<br>"; 
     echo "curl_error=". curl_error($ch) ."<br>"; 
    curl_setopt($ch,CURLOPT_POST,count($fields)); 
    curl_setopt($ch, CURLOPT_POST, 1); 
    curl_setopt($ch, CURLOPT_POSTFIELDS, "?SearchTerm=$SearchTerm"); 
    curl_setopt($ch, CURLOPT_HTTPGET, 1); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 9999999); 
    curl_setopt($ch,CURLOPT_HTTPHEADER,array (
     "Accept: application/json" 
    )); 




    $info = curl_getinfo($ch); 

    //execute post 
    $result = curl_exec($ch); 
    print $result; 


print "<pre>\n"; 
print_r(curl_getinfo($ch)); // get error info 

?> 
+1

副手我会说,因为内容似乎是AJAX填充。用PHP/CURL刮不会让你走得太远,你需要拦截AJAX调用,并获得javascript在后台使用的结果。 – 2011-02-24 19:22:06

回答

4

不伤你的头,用它!

  • 安装fiddler
  • 使用浏览器做一个请求,看看fiddler到底是什么发布。这包括所有标题,cookie和表单变量。
  • 使用您的代码做一篇文章,再次检查提琴手
  • 比较两者之间的差异并调整您的脚本。
  • 重复。

另外它有助于安装firebug。使用复制Xpath,并将其放入一个php DOM xpath查询使得刮乐趣和轻松!

+0

我喜欢网页截图的DOM Xpath查询想法 – emaillenin 2011-02-25 17:32:20

+0

@emailenin - 请记住删除萤火虫放入的元素 – 2011-02-25 17:36:36

+0

您提供的步骤是我在抓取网站时总是使用的步骤; – 2011-02-26 04:58:17