我需要将大矩阵转换为与libsvm一起使用的特定格式。该格式包含开始每一行与一个标签(1或-1),接着0:ROW_NUMBER和行的值作为1:value_at_row_number_1st_column加快矩阵格式化
下面给出的简单的方法是太慢,
require(microbenchmark)
nR = 100; nC = 500
kMat = matrix(runif(nR*nC), nrow=nR)
yLab = sample(c(1, -1), nR, replace = T)
# Simple method
met1 = function() {
lines = c()
for(ix in 1:nrow(kMat))
lines = c(lines,
paste(yLab[ix],
paste0("0:", ix),
paste0(1:ncol(kMat), ":", kMat[ix, ], collapse=" ")))
lines
}
我也取得了较快〜50%的版本(虽然这样丑陋的),
# Sprintf
met2 = function() {
fmt = c("%i", "0:%i", paste0(1:ncol(kMat), ":%f"))
kMat = cbind(yLab, 1:nrow(kMat), kMat)
# Unfortunately sprintf cannot handle more than 100 arguments
splts = lapply(seq(1, length(fmt), 99L),
function(ix) {
r = ix:min(ncol(kMat), ix+98L)
list(range = r, fmt = list(paste(fmt[r], collapse = " ")))
})
lines = sapply(1:nrow(kMat),
function(ix) {
Reduce(function(a, b) sprintf("%s %s", a, b),
sapply(splts,
function(s){
do.call(sprintf, c(s$fmt, kMat[ix, s$range]))
}),
"")
})
lines
}
print(microbenchmark(met1(), met2()))
Unit: milliseconds
expr min lq mean median uq max neval
met1() 85.83051 88.00289 92.01948 88.61834 90.31918 175.3362 100
met2() 44.81729 45.61020 56.12835 54.75313 56.65249 108.7218 100
是否有更快(或更整洁)的方式来处理这种格式?
90毫秒是太慢? – Roland
这只是一个测试样本,我将在更大的集合上工作,也会多次重复该操作 – jMathew
我并不乐观,您可以使用R做得更好。您可能需要切换到其他语言。 – Roland