我想在我们的集群上运行此作业,并且我不断收到此类型的对象'closure'不是子集表达式“错误。它基本上在一堆节点上运行这个函数“do_1()”。我正在进行子集化的闭包对象被称为“数据”,所以我认为这是因为RData文件没有在每个节点上读取(这可能不是调用每个这些单个数据集“数据”的最佳实践,所以这是我的错误) 。集群r脚本不正确读取RData数据集
我将脚本剥离为尽可能裸露的骨骼,并显示在下面。提交作业时,它仍会产生相同的错误。我认为有些东西我不知道在每个节点上的单独数据集中读取......我在调用load()时可能没有指定一些参数。也许“数据”数据集不在正确的命名空间或什么......我不确定。任何想法将受到赞赏。
library(parallel)
library(Rmpi)
np <- mpi.universe.size()
cl <- makeCluster(np, type = "MPI")
allFiles <- list.files("/bigtmp/trb5me/rdata_files/")
allFiles <- sapply(allFiles, function(string) paste("/bigtmp/trb5me/rdata_files/", string, sep = ""))
run_one_day <- function(daynum){
# do we want to subset days to not the first hour?
train <- data[[daynum]] * 10000
train
}
clusterExport(cl = cl, "run_one_day")
do_1 <- function(path_to_file){
if(!require(xts)){
install.packages("xts")
library(xts)
}
# load data
load(file=path_to_file)
# extract the symbol name so we cna save the results later
symbolName <- strsplit(path_to_file, "/")[[1]][5]
symbolName <- strsplit(symbolName, ".", fixed = T)[[1]][1]
# get the results
# there is also a function called data...so in this case it's length will be 1
mySequence <- 1:(length(data)-1)
myResults <- lapply(mySequence, run_one_day) #this is where the problem is!
# save the results
path_dest <- paste("/bigtmp/trb5me/mod1_results/", symbolName, ".RData", sep = "")
save(myResults, file = path_dest)
# remove everything from memory
rm(list=ls())
}
parLapply(cl, allFiles, do_1)
# turn off all the cluster stuff
stopCluster(cl)
mpi.exit()
错误来自哪个函数?尝试包括选项(错误=回溯) –
等等,我明白了;没关系。 –