2013-10-05 140 views
9

我正在尝试使用下面的代码从网站下载csv文件(广告报告)。问题是,它会下载HTML页面而不是csv文件。我不能给你的网址,因为它是登录的背后,却是相似的情况下,当你从下面的网址casperjs下载csv文件

http://www.mozilla.org/en-US/firefox/new/

下载Firefox这是一个GET请求,当我做检查元素的网络选项卡上的GET请求被取消。我是卡斯帕新手,不知道如何处理这些请求。任何帮助,将不胜感激

casper.then(function() { 
    var downloadURL = ""; 

    this.evaluate(function() { 
     var downloadURL = "http://www.lijit.com"+jQuery("#dailyCSV").attr('href'); 
    }); 

    this.download(downloadURL, '/Users/Ujwal/Downloads/caspertests/stats.csv'); 
}); 

响应头

Age:0 
Cache-Control:max-age=0 
Connection:keep-alive 
Content-Disposition:attachment; filename=stats.csv 
Content-Encoding:gzip 
Content-Length:1634 
Content-Type:text/x-csv 
Date:Sat, 05 Oct 2013 15:28:21 GMT 
Expires:Sat, 05 Oct 2013 15:28:21 GMT 
P3P:CP="CUR ADM OUR NOR STA NID" 
Server:PWS/8.0.16 
Vary:Accept-Encoding 
X-Px:ms h0-s28.p9-jfk (h0-s62.p9-jfk), ms h0-s62.p9-jfk (origin>CONN) 

回答

16

回答我自己的问题,这里是解决方案

参考:https://github.com/knorrium/google-books-downloader/blob/master/gbd.js

//Download the daily csv 
casper.then(function() {  
    this.click('#dailyCSV'); 
}); 

casper.on('resource.received', function (resource) { 
    "use strict"; 
    if ((resource.url.indexOf("publisherCSV/?startDate=") !== -1)) {   
     this.echo(resource.url); 
     var url, file; 
     url = resource.url; 
     file = "stats.csv"; 
     try { 
      this.echo("Attempting to download file " + file); 
      var fs = require('fs'); 
      casper.download(resource.url, fs.workingDirectory+'/'+file); 
     } catch (e) { 
      this.echo(e); 
     } 
    } 
});