2017-07-28 51 views
0

我定期从eurostat下载一个包含R的eurostat包的数据集,并使用函数label_eurostat()标记它。下面的代码只是工作在过去的罚款,但给了我一些错误,因为这一周:如何解决label_eurostat()中的错误:“字典信息丢失”

> emprt <- get_eurostat("lfst_r_lfe2emprt", time_format = "num") 
> emprt <- filter(emprt, sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) 
> emprt <- dcast(emprt, geo ~ time) 
Using values as value column: use value.var to override. 
> emprt <- label_eurostat(emprt, lang = "de") 
Error in label_eurostat(emprt, lang = "de") : 
Dictionary information is missing 

我也尝试了具体的解释,但收到另一条警告消息:如果

> emprt <- label_eurostat(emprt, dic = "geo", lang = "de") 
Warning message: 
In label_eurostat(emprt, dic = "geo", lang = "de") : 
    All labels for geo were not found. 

我不确定字典是可供选择的字典,但它是我在eurostat找到的唯一字典。 我也看到,还有其他的一些问题具有这种功能造成这样的错误:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : 
factor level [19] is duplicated 

但我不确定是否这是一个关系到我的问题。 我很感谢每一个提示!

回答

0

你可以使用

packageVersion("eurostat") 
# [1] ‘3.1.1’ 
library(eurostat) 
library(tidyverse) 
library(reshape2) 
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>% 
    filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>% 
    dcast(geo ~ time) %>% 
    droplevels %>% 
    mutate(geo = label_eurostat(geo, dic = "geo", lang = "de")) 

get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>% 
    filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>% 
    label_eurostat(lang = "de") %>% 
    dcast(geo ~ time) 

至于警告:如果不删除未使用geo因子水平,label_eurostat可以分配重复的标签;例如考虑

get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>% 
    pull(geo) %>% 
    levels %>% 
    grep(pattern = "^DE3", value = TRUE) 
# [1] "DE3" "DE30" 

如果你现在看get_eurostat_dic("geo"),既DE3DE30导致Berlin

get_eurostat_dic("geo") %>% filter(grepl("^DE30?$", code_name)) 
# # A tibble: 2 x 2 
# code_name full_name 
#  <chr>  <chr> 
# 1  DE3 Berlin 
# 2  DE30 Berlin 

旁注:你不需要reshape2::dcast如果你有加载的tidyverse;您也可以改为select(geo, time, values) %>% spread(time, values)

相关问题