2016-08-12 61 views
0

我使用casperJS 1.1.2和phantomJS 2.1.1从网页中检索一些链接。我感兴趣的所有环节都在href属性字符串“的JavaScript”,如下图所示:使用casperJS从网页上刮取一些链接

<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl01&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species A  
</a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl02&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species B </a></td> 
<td><a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl03&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Sepcies C </a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl04&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species D</a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl05&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species E </a></td> 

我写casperJS一些脚本凑所有并写入文件都在那里将href属性包含的链接一个“javascript”字符串,如下所示。

var links=[]; 
var casper = require('casper').create({ 
    waitTimeout: 10000, 
    verbose: true, 
    logLevel: 'debug', 
    pageSettings: { 
     loadImages: false, 
     loadPlugins: false 
    } 
}); 

var fs = require('fs'); 

casper.start("https://apps.ams.usda.gov/CMS/", function() 
    { 
     links = _utils_.getElementsByXPath('.//td/a[contains(@href,"javascript")]'); 
    }); 

fs.write("plantVarietyResults.json", links, 'w'); 


casper.run(); 

我不明白为什么我的脚本没有正确写入文件的链接。

回答

0

有你的代码中CasperJS和错误的一些误解:

下面是应该工作的例子(未经测试):

casper.start("https://apps.ams.usda.gov/CMS/", function() { 
    var links = this.evaluate(function(){ 
     return __utils__.getElementsByXPath('.//td/a[contains(@href,"javascript")]') 
      .map(function(element){ 
       return element.href; 
      }); 
    }); 
    fs.write("plantVarietyResults.json", JSON.stringify(links), 'w'); 
}); 

casper.run(); 

这里有一个稍短的方式:

var x = require('casper').selectXPath; 
casper.start("https://apps.ams.usda.gov/CMS/", function() { 
    var links = this.getElementsAttribute(x('.//td/a[contains(@href,"javascript")]'), 'href'); 
    fs.write("plantVarietyResults.json", JSON.stringify(links), 'w'); 
}); 

casper.run(); 
+0

非常感谢@Artjom B. – ProfLonghair