2016-05-14 74 views
5

我使用的光栅包,以降低大栅格的分辨率,使用功能总这样更快的功能,以降低光栅R的分辨率

require(raster)  
x <- matrix(rpois(1000000, 2),1000) 

a <-raster(x) 
plot(a) 

agg.fun <- function(x,...) 
    if(sum(x)==0){ 
     return(NA) 
    } else { 
     which.max(table(x)) 
    } 

a1<-aggregate(a,fact=10,fun=agg.fun) 
plot(a1) 

的光栅图像我有聚集是多少更大的34000x34000,所以我想知道是否有更快的方法来实现agg.fun函数。

+2

请注意,'which.max(table(x))'返回具有最大重复值的索引,而不是值。在你的情况下,大多数情况下,索引将与值重合,但要确保具有应该使用'as.numeric(names(which.max(table(x))))'...的值... – digEmAll

+0

就性能而言...嗯,我想你应该诉诸一些Rcpp代码块来获得一些东西... – digEmAll

回答

3

您可以使用gdalUtils::gdalwarp。对我来说,它的效率比@JosephWood的fasterAgg.Fun效率更低,但对于约瑟夫更大的例子来说,它的速度要快得多。它要求光栅存在于磁盘上,所以如果光栅在内存中,则将写入时间写入下面。

下面我使用了fasterAgg.Fun的修改,它返回最频繁的,而不是它在块中的索引。

library(raster) 
x <- matrix(rpois(10^8, 2), 10000) 
a <- raster(x) 

fasterAgg.Fun <- function(x,...) { 
    myRle.Alt <- function (x1) { 
     n1 <- length(x1) 
     y1 <- x1[-1L] != x1[-n1] 
     i <- c(which(y1), n1) 
     x1[i][which.max(diff(c(0L, i)))] 
    } 

    if (sum(x)==0) { 
     return(NA) 
    } else { 
     myRle.Alt(sort(x, method="quick")) 
    } 
} 
system.time(a2 <- aggregate(a, fact=10, fun=fasterAgg.Fun)) 

## user system elapsed 
## 67.42 8.82 76.38 

library(gdalUtils) 
writeRaster(a, f <- tempfile(fileext='.tif'), datatype='INT1U') 
system.time(a3 <- gdalwarp(f, f2 <- tempfile(fileext='.tif'), r='mode', 
          multi=TRUE, tr=res(a)*10, output_Raster=TRUE)) 

## user system elapsed 
## 0.00 0.00 2.93 

注意,有在当有关系的模式的定义略有不同:gdalwarp选择最高值,而功能传递给上述aggregate(经由which.max的行为)选择最低的(例如,见which.max(table(c(1, 1, 2, 2, 3, 4))))。

此外,将栅格数据存储为整数很重要(如果适用)。例如,如果数据以float形式存储(默认为writeRaster),则上述操作在我的系统上需要约14秒。有关可用类型,请参阅?dataType

3

试试这个:

fasterAgg.Fun <- function(x,...) { 
    myRle.Alt <- function (x1) { 
     n1 <- length(x1) 
     y1 <- x1[-1L] != x1[-n1] 
     i <- c(which(y1), n1) 
     which.max(diff(c(0L, i))) 
    } 

    if (sum(x)==0) { 
     return(NA) 
    } else { 
     myRle.Alt(sort(x, method="quick")) 
    } 
} 

library(rbenchmark) 
benchmark(FasterAgg=aggregate(a, fact=10, fun=fasterAgg.Fun), 
      AggFun=aggregate(a, fact=10, fun=agg.fun), 
      replications=10, 
      columns = c("test", "replications", "elapsed", "relative"), 
      order = "relative") 
     test replications elapsed relative 
1 FasterAgg   10 12.896 1.000 
2 AggFun   10 30.454 2.362 

对于较大的测试对象,我们有:

x <- matrix(rpois(10^8,2),10000) 
a <- raster(x) 
system.time(a2 <- aggregate(a, fact=10, fun=fasterAgg.Fun)) 
    user system elapsed 
111.271 22.225 133.943 

system.time(a1 <- aggregate(a, fact=10, fun=agg.fun)) 
    user system elapsed 
282.170 24.327 308.112 

如果你想实际值@digEmAll在上面的评论说,简单地改变返回值在myRle.Altwhich.max(diff(c(0L, i)))x1[i][which.max(diff(c(0L, i)))]

3

只是为了好玩我还创建了一个RCPP功能(比@JosephWood快不了多少):

########### original function 
#(modified to return most frequent value instead of index) 
agg.fun <- function(x,...){ 
    if(sum(x)==0){ 
     return(NA) 
    } else { 
     as.integer(names(which.max(table(x)))) 
    } 
} 
########### @JosephWood function 
fasterAgg.Fun <- function(x,...) { 
    myRle.Alt <- function (x1) { 
     n1 <- length(x1) 
     y1 <- x1[-1L] != x1[-n1] 
     i <- c(which(y1), n1) 
     x1[i][which.max(diff(c(0L, i)))] 
    } 

    if (sum(x)==0) { 
     return(NA) 
    } else { 
     myRle.Alt(sort(x, method="quick")) 
    } 
} 
########### Rcpp function 
library(Rcpp) 
library(inline) 

aggrRcpp <- cxxfunction(signature(values='integer'), ' 
       Rcpp::IntegerVector v(clone(values)); 
       std::sort(v.begin(),v.end()); 
       int n = v.size(); 
       double sum = 0; 
       int currentValue = 0, currentCount = 0, maxValue = 0, maxCount = 0; 
       for(int i=0; i < n; i++) { 
        int value = v[i]; 
        sum += value; 
        if(i==0 || currentValue != value){ 
         if(currentCount > maxCount){ 
         maxCount = currentCount; 
         maxValue = currentValue; 
         } 
         currentValue = value; 
         currentCount = 0; 
        }else{ 
         currentCount++; 
        } 
       } 
       if(sum == 0){ 
        return Rcpp::IntegerVector::create(NA_INTEGER); 
       } 
       if(currentCount > maxCount){ 
        maxCount = currentCount; 
        maxValue = currentValue; 
       } 
       return wrap(maxValue) ; 
     ', plugin="Rcpp", verbose=FALSE, 
     includes='') 
# wrap it to support "..." argument 
aggrRcppW <- function(x,...)aggrRcpp(x); 

基准:

require(raster) 
set.seed(123) 
x <- matrix(rpois(10^8, 2), 10000) 
a <- raster(x) 

system.time(a1<-aggregate(a,fact=100,fun=agg.fun)) 
# user system elapsed 
# 35.13 0.44 35.87 
system.time(a2<-aggregate(a,fact=100,fun=fasterAgg.Fun)) 
# user system elapsed 
# 8.20 0.34 8.59 
system.time(a3<-aggregate(a,fact=100,fun=aggrRcppW)) 
# user system elapsed 
# 5.77 0.39 6.22 

########### all equal ? 
all(TRUE,all.equal(a1,a2),all.equal(a2,a3)) 
# > [1] TRUE 
0

如果你的目标是聚集,难道你不想要的max功能?

library(raster)  
x <- matrix(rpois(1000000, 2),1000) 
a <- aggregate(a,fact=10,fun=max) 
+0

我认为OP正在寻找最频繁发生的值,而不是最大的值。 –