2014-02-23 34 views
0

我试图将子集从data.frame转换为data.table以改善我的代码的性能。但我对data.table完全陌生。 data.table此子集表述类型的等效项目是什么?将联合类型子集从data.frame转换为data.table

for(ii in 1:nplayer) 
    { 
    subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii]) 
    players[ii,4]<-nrow(subgame) 
    } 

我已经定义了这样一个新的data.tablegameDT dput的

gameDT<-data.table(game) 
    setkey(gameDT,playerA,playerB) 

输出

>dput(game[1:2,]) 
    structure(list(country = c("New Zealand", "Australia"), tournament = c("WTA Auckland 2012", 
    "WTA Brisbane 2012"), date = c("2011-12-31 00:00:00", "2011-12-30 00:15:00" 
    ), playerA = c("Schoofs B.", "Lucic M."), playerB = c("Puig M.", 
    "Tsurenko L."), resultA = c(1L, 1L), resultB = c(2L, 2L), oddA = c("1.8", 
    "2.17"), oddB = c("1.9", "1.57"), N = c(4L, 3L), Weight = c(1, 
    0.973608997871031)), .Names = c("country", "tournament", "date", 
    "playerA", "playerB", "resultA", "resultB", "oddA", "oddB", "N", 
    "Weight"), row.names = 1:2, class = "data.frame") 
+3

你能dput数据集或它的子集(例如dput(游戏[1:20,]))? –

+0

'data.table'中的子集语法就是'dt [playerA ==“a”| playerB ==“a”]' –

回答

1

你可以考虑使用lapply如果这不只是一个学习锻炼data.table

I想到下面的例子相当于你正在尝试做的,你看到的,通过使用lapply一个相当不错的加速:

set.seed(123) 
library(microbenchmark) 

game = data.frame(runif(1:50) , playerA = sample(letters[1:5], 50, replace = T), playerB = sample(letters[1:5], 50, replace = T)) 

player <- union(game$playerA, game$playerB) 
nplayer <- length(player) 
players <- matrix(player, nrow = nplayer, ncol = 2) 

op <- microbenchmark(
    LAPPLY = {counts <- lapply(1:nplayer, 
          function(i) sum(game$playerA == player[i] | game$playerB == player[i])) 
      names(counts) <- player }, 
    ORIG = { 
     for(ii in 1:nplayer) 
     { 
      subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii]) 
      players[ii,2]<-nrow(subgame) 
     }}, 
    times = 1000) 

op 

#Unit: microseconds 
# expr  min  lq median  uq  max neval 
# LAPPLY 236.493 251.9985 259.095 269.3205 8323.701 1000 
# ORIG 938.194 981.9060 1002.880 1036.6705 61095.935 1000 

unlist(counts) 

# a c d b e 
#19 17 20 20 15 

players 

#  [,1] [,2] 
#[1,] "a" "19" 
#[2,] "c" "17" 
#[3,] "d" "20" 
#[4,] "b" "20" 
#[5,] "e" "15" 
+0

Thanx,但我正在学习data.table。 – emanuele

+0

你能否更好地解释我这句话的含义:'names(counts)< - player' – emanuele

+0

它将list中的元素命名为'counts'。在这种情况下,与'names(counts)< - c(“a”,“b”,“c”,“d”,“e”)“相同。 –

相关问题