2017-08-29 56 views
14

如果我将一个分组数据框传递给一个函数,然后更改分组变量的名称,则将原始数据框的分组更改为新名称。当函数返回时(我没有返回已更改的数据帧),原始数据帧的名称保持不变,但分组更改为不存在的名称。为什么group_by()会影响超出范围的数据帧?

# test scoping of group_by() which appears to change groups 
library(dplyr) 

muck_up_group<-function(mydf){ 
    mydf<-mydf %>% rename(UhOh=Species) 
} 

dont_muck_up_group<-function(mydf){ 
    mydf<-mydf %>% ungroup() 
    mydf<-mydf %>% rename(UhOh=Species) 
} 

data("iris") 
iris<-as_tibble(iris) %>% group_by(Species) 
iris 
# A tibble: 150 x 5 
# Groups: Species [3] 
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
#   <dbl>  <dbl>  <dbl>  <dbl> <fctr> 
# 1   5.1   3.5   1.4   0.2 setosa 

muck_up_group(iris) # original grouping changed to column name that doesn't exist 
iris 
# A tibble: 150 x 5 
# Groups: UhOh [3] 
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
#   <dbl>  <dbl>  <dbl>  <dbl> <fctr> 
# 1   5.1   3.5   1.4   0.2 setosa 

#restore original state 
iris<-as_tibble(iris) %>% group_by(Species) 
dont_muck_up_group(iris) # original grouping preserved 
iris 
# A tibble: 150 x 5 
# Groups: Species [3] 
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
#   <dbl>  <dbl>  <dbl>  <dbl> <fctr> 
# 1   5.1   3.5   1.4   0.2 setosa 

我可以理解为什么更改分组变量的名称可能是不好的做法,但它是允许的。这似乎是当内容按值传递时通过引用传递的变量属性的一个示例(正如我们通常理解的那样)。

> sessionInfo() 
R version 3.4.0 (2017-04-21) 
Platform: x86_64-w64-mingw32/x64 (64-bit) 
Running under: Windows 7 x64 (build 7601) Service Pack 1 

Matrix products: default 

locale: 
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C       
[5] LC_TIME=English_United States.1252  

attached base packages: 
[1] graphics grDevices utils  datasets stats  methods base  

other attached packages: 
[1] lubridate_1.6.0    bindrcpp_0.2     mFilter_0.1-3     
[4] ggrepel_0.6.5     reshape2_1.4.2    scales_0.4.1     
[7] purrr_0.2.3     readr_1.1.1     tidyr_0.7.0     
[10] tibble_1.3.4     tidyverse_1.1.1    knitr_1.17     
[13] Rblpapi_0.3.6     stringr_1.2.0     rvest_0.3.2     
[16] xml2_1.1.1     devtools_1.13.3    dplyr_0.7.2     
[19] plyr_1.8.4     ggplot2_2.2.1     PerformanceAnalytics_1.4.3541 
[22] xts_0.10-0     zoo_1.8-0      

loaded via a namespace (and not attached): 
[1] Rcpp_0.12.12  lattice_0.20-35 assertthat_0.2.0 rprojroot_1.2  digest_0.6.12  
[6] psych_1.7.5  R6_2.2.2   cellranger_1.1.0 backports_1.1.0 evaluate_0.10.1 
[11] httr_1.3.1   highr_0.6   rlang_0.1.2  curl_2.8.1   lazyeval_0.2.0  
[16] readxl_1.0.0  TTR_0.23-2   tidyquant_0.5.3 rmarkdown_1.6  labeling_0.3  
[21] foreign_0.8-67  munsell_0.4.3  broom_0.4.2  compiler_3.4.0  modelr_0.1.1  
[26] pkgconfig_2.0.1 base64enc_0.1-3 mnormt_1.5-5  htmltools_0.3.6 tidyselect_0.1.1 
[31] withr_2.0.0  Quandl_2.8.0  grid_3.4.0   nlme_3.1-131  jsonlite_1.5  
[36] gtable_0.2.0  magrittr_1.5  quantmod_0.4-10 stringi_1.1.5  RColorBrewer_1.1-2 
[41] tools_3.4.0  forcats_0.2.0  glue_1.1.1   hms_0.3   rsconnect_0.8.5 
[46] parallel_3.4.0  yaml_2.1.14  colorspace_1.3-2 memoise_1.1.0  bindr_0.1   
[51] haven_1.1.0  
> 

错误?谢谢。

+0

随着dplyr_0.5.0,我不能再现这一点。 – rsmith54

+2

我已经用dplyr 0.7.2转载了它。我建议你发布'sessionInfo()'的输出。 – eipi10

+0

我已经使用了两者。使用dplyr_0.5.0,这不能被复制,但使用dplyr_0.7.2,它可以。错误?特征? – coffeinjunky

回答

1

请参阅上面的@ aosmith评论。 Dplyr关闭了问题。

相关问题