无法使用rvest

网络废料内容我期待网络抓取每个层次结构下的所有代码和代码，如使用R package rvest从本网站的左侧面板上看到的。无法使用rvest

URL-- http://apps.who.int/classifications/icd10/browse/2016/en/

首先，我开始尝试这种代码 -

url<-"http://apps.who.int/classifications/icd10/browse/2016/en/" 
src<-read_html(url) 
ATC<-src%>%html_node("a.ygtvlabel")%>%html_text

a.ygtvlbel是我看到网页中的文本徘徊上课的时候。

但是它只是返回NA_character。我看到该页面的html源代码并不直接包含这些代码（Ex-parasitic diseases），而是可能对所有内容有一个href。

我该如何去关于刮这样一个页面。好心提醒。

来源

2017-05-04 Meenakshi Vikram

b/c使用实际的API将会很糟糕吗？ https://cran.r-project.org/web/packages/WHO/index.html – hrbrmstr

谢谢@hrbrmstr。 API实际上给了一个新的思路。从提示中，我使用了R package - icd，并从包定义的变量中获得了主要章节和子章节，我正在专门查找ICD10代码。无法得到最低等级的代码（我的意思是霍乱弧菌01，霍乱弧菌引起的A00.0霍乱）。但我想知道是否混合使用API进行打包，将探索更多。 –

与许多这些类型的页面一样，此页面为包含数据的json文件发出后台请求。这可以通过使用浏览器调试工具并查看网络请求来发现。使用评论中提到的API是一个更好的选择

library(httr) 
library(jsonlite) 

## original url<-"http://apps.who.int/classifications/icd10/browse/2016/en/" 

json_url <- "http://apps.who.int/classifications/icd10/browse/2016/en/JsonGetRootConcepts?useHtml=false" 
json_data <- rawToChar(GET(json_url)$content) 

categories <- fromJSON(json_data) 
categories$label 
# [1] "I Certain infectious and parasitic diseases"                
# [2] "II Neoplasms"                       
# [3] "III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism" 
# [4] "IV Endocrine, nutritional and metabolic diseases"              
# gories$label

来源

2017-05-04 11:25:08 epi99

谢谢@ epi99。同意，我从R包中获取了部分数据。不过，如果它能帮助我获取页面上的所有内容，仍然会尝试使用您的代码。 –

是'内容（RESULTOFGET，as =“解析”）'不适用于您的系统？ – hrbrmstr

回答

相关问题