2014-05-02 51 views
5

我尝试使用R和XML包加载一些可公开获得的数据NHS但我不断收到以下错误信息:我好像htmlParse无法加载外部实体

Error: failed to load external entity " http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/ "

无法弄清楚尽管查看了一些相关问题,但可能会导致这种情况。

这里是我的代码非常简单:

library("XML") 
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- htmlParse(url) 

编辑:会话信息

R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit)

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats graphics grDevices utils
datasets methods base

loaded via a namespace (and not attached): [1] tools_3.0.1

+0

这不是一个有效的XML文档:[W3 Validator](http://validator.w3.org/check?uri=http%3A%2F%2Fwww.england.nhs.uk%2Fstatistics%2Fstatistical-work-areas%2Fbed -availability和 - 占用%2F&字符集=%28detect +自动%29&DOCTYPE =内嵌&组= 0&详细= 1)。它至少应该是XHTML,而不是HTML5。 – CoDEmanX

+0

当我在Ubuntu上运行该代码时,它成功运行在r-fiddle上。你可以添加sessionInfo()吗? http://www.r-fiddle.org/#/fiddle?id=AfoyOSGm –

+0

sessionInfo()添加!我怀疑我已经有了答案。这几乎肯定是由我的作品的代理人造成的。我以前遇到过这个问题(通过QGIS),并且从未找到满意的解决方案。 – Tumbledown

回答

5

包XML有一些问题。问题是intermitent并且与URL无关。我使用HTTR包的功能GET以获得html代码解决了这个问题,然后将其传递给htmlParse,见下图:

library("XML") 
library(httr) 
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- htmlParse(rawToChar(GET(url)$content)) 
3

您还可以使用rvest &的xml2包:

library(rvest) # github version 
library(xml2) # github version 

url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- read_html(url) 

doc %>% 
    html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>% 
    html_attr("href") 

## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/" 
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/" 
相关问题