2017-10-19 49 views
0

我刮了以下网站:https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio刮JavaScript对象和R /内转换成JSON Rvest

我试图让货币汇率表到通过rvest包R的数据框,但表格本身是在HTML代码中的JavaScript变量中配置的。

我所在的相关CSS选择器,现在我有这个:

library(rvest)  
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
     read_html() %>% 
     html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') 

我的输出是现在下面的JavaScript脚本,作为XML节点集:

<script> 
$(document).ready(function(){ 
    var valor = '{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, {"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, {"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, {"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, {"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, {"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], "tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}'; 
    if(valor != '{}'){ 
     var objJSON = eval("(" + valor + ")"); 
     var tabla="<tbody>"; 
     for (var i = 0; i < objJSON["tablaDolar"].length; i++) { 
      tabla+= "<tr>"; 
      tabla+= "<td>" + objJSON["tablaDolar"][i].nombreDolar + "</td>"; 
      tabla+= "<td>$" + objJSON["tablaDolar"][i].compra + "</td>"; 
      tabla+= "<td>$" + objJSON["tablaDolar"][i].venta + "</td>"; 
      tabla+= "</tr>"; 
     } 
     tabla+= "</tbody>"; 
     $("#tablaDolar").append(tabla); 
     var tabla2=""; 
     for (var i = 0; i < objJSON["tablaDivisas"].length; i++) { 
      tabla2+= "<tr>"; 
      tabla2+= "<td>" + objJSON["tablaDivisas"][i].nombreDivisas + "</td>"; 
      tabla2+= "<td>$" + objJSON["tablaDivisas"][i].compra + "</td>"; 
      tabla2+= "<td>$" + objJSON["tablaDivisas"][i].venta + "</td>"; 
      tabla2+= "</tr>"; 
     } 
     tabla2+= "</tbody>"; 
     $("#tablaDivisas").append(tabla2); 
    } 
    bmnIndicadoresResponsivoInstance.cloneResponsive(0); 
}); 
</script> 

我的问题是,怎么办我删除了几乎所有的JavaScript函数/操作符,以仅获取此数据,并将其最终转换为JSON表,如下所示:

{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, 
{"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, 
{"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, 
{"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, 
{"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, 
{"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], 
"tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]} 

换句话说,我需要从使用R.

出于某种原因,我有麻烦完成这件事都在R(JS脚本提取“英勇”的变量,而不必变量导出为外部.txt文件,然后使用一个子)

回答

0

你可以这样做:

library(rvest)  
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
    read_html() %>% 
    html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') %>% 
    as_list() 

banorte_vec <- strsplit(banorte[[c(1,1)]],"\r\n")[[1]] 
valor <- grep("valor = ", banorte_vec, value = T) 
valor <- gsub("\tvar valor = ","",valor) 
valor <- gsub("';$","",valor) 
valor <- gsub("^'","",valor) 

library(jsonlite) 
result <- fromJSON(valor) 
result 

$tablaDivisas 
    nombreDivisas compra venta 
1 FRANCO SUIZO 18.60 19.45 
2 LIBRA ESTERLINA 24.20 25.15 
3  YEN JAPONES 0.1635 0.171 
4 CORONA SUECA 2.15 2.45 
5 DOLAR CANADA 14.50 15.35 
6   EURO 21.75 22.60 

$tablaDolar 
    nombreDolar compra venta 
1 VENTANILLA 17.73 19.15 
0

肯定一点更重量级的答案,但推广到其它更粗糙的“JavaScript的问题”。

library(rvest) 
library(stringi) 
library(V8) 
library(tidyverse) 

banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
     read_html() %>% 
     html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') 

我们将建立一个JavaScript V8背景:

ctx <- v8() 

然后:

  • 得到<script>内容
  • 将其切分为线
  • 把它变成一个纯字符矢量
  • 取出克鲁夫特
  • 评估的JavaScript

这是不是太糟糕:

html_text(banorte) %>% 
    stri_split_lines() %>% 
    flatten_chr() %>% 
    keep(stri_detect_regex, "^\tvar") %>% 
    ctx$eval() 

由于JavaScript是一个JSON字符串,我们做的EVAL中的R VS V8:

jsonlite::fromJSON(ctx$get("valor")) 
## $tablaDivisas 
##  nombreDivisas compra venta 
## 1 FRANCO SUIZO 18.60 19.45 
## 2 LIBRA ESTERLINA 24.20 25.15 
## 3  YEN JAPONES 0.1635 0.171 
## 4 CORONA SUECA 2.15 2.45 
## 5 DOLAR CANADA 14.50 15.35 
## 6   EURO 21.75 22.60 
## 
## $tablaDolar 
## nombreDolar compra venta 
## 1 VENTANILLA 17.73 19.15 

如果在javascript中有其他有用的处理,这会更好地推广。

注意:我的Chrome测试版频道中的Google翻译没有很好地翻译该网站,但我认为您非常接近违反“TérminosLegales”页面上第6项的精神,但直到我可以翻译它我不能完全说明。当/如果我能和看起来像你一样,我会删除它。