2017-10-14 129 views
-2

我有大约3000行的数据集。该数据可以经由https://pastebin.com/i4dYCUQX总和()在dplyr和汇总:NA值

问题进行访问:NA导致输出,虽然似乎没有NA中的数据。这是当我尝试通过dplyr或聚集总结一列中的每个类别总价值会发生什么:

example <- read.csv("https://pastebin.com/raw/i4dYCUQX", header=TRUE, sep=",") 
example 

# dplyr 
example %>% group_by(size) %>% summarize_at(vars(volume), funs(sum)) 

Out: 
# A tibble: 4 x 2 
     size volume 
     <fctr>  <int> 
1 Extra Large  NA 
2  Large  NA 
3  Medium 937581572 
4  Small  NA 

# aggregate 
aggregate(volume ~ size, data=example, FUN=sum) 

Out: 
     size volume 
1 Extra Large  NA 
2  Large  NA 
3  Medium 937581572 
4  Small  NA 

当试图通过colSums访问的价值,它似乎工作:

# Colsums 
small <- example %>% filter(size == "Small") 
colSums(small["volume"], na.rm = FALSE, dims = 1) 

Out: 
volume 
3869267348 

谁能想象的问题可能是什么?

+2

好了,我相信了_Warning messages_相当翔实:'[...]整数溢出 - 使用和' – Henrik

回答

1

要注意的第一件事是,运行你的榜样,我得到:

example <- read.csv("https://pastebin.com/raw/i4dYCUQX", header=TRUE, sep=",") 
# dplyr 
example %>% group_by(size) %>% summarize_at(vars(volume), funs(sum)) 
#> Warning in summarise_impl(.data, dots): integer overflow - use 
#> sum(as.numeric(.)) 

#> Warning in summarise_impl(.data, dots): integer overflow - use 
#> sum(as.numeric(.)) 

#> Warning in summarise_impl(.data, dots): integer overflow - use 
#> sum(as.numeric(.)) 
#> # A tibble: 4 × 2 
#>   size volume 
#>  <fctr>  <int> 
#> 1 Extra Large  NA 
#> 2  Large  NA 
#> 3  Medium 937581572 
#> 4  Small  NA 

其中明确规定,你总和四溢的整数类型。如果我们这样做的警告信息提示,我们可以把整数转换为数字和再总结:


example <- read.csv("https://pastebin.com/raw/i4dYCUQX", header=TRUE, sep=",") 
# dplyr 
example %>% group_by(size) %>% summarize_at(vars(volume), funs(sum(as.numeric(.)))) 
#> # A tibble: 4 × 2 
#>   size  volume 
#>  <fctr>  <dbl> 
#> 1 Extra Large 3609485056 
#> 2  Large 11435467097 
#> 3  Medium 937581572 
#> 4  Small 3869267348 

这里funs(sum)已取代funs(sum(as.numeric(.))这是相同的,每个执行sum但是首先转换为numeric

+0

谢谢,帮助!那么,整数有一个限制? – Christopher

1

它,因为价值是一个整数,而不是数字

example$volume <- as.numeric(example$volume) 

aggregate(volume ~ size, data=example, FUN=sum) 

     size  volume 
1 Extra Large 3609485056 
2  Large 11435467097 
3  Medium 937581572 
4  Small 3869267348 

详细检查这里:

What is integer overflow in R and how can it happen?

+0

谢谢,帮助,太(as.numeric()。)! – Christopher