2016-08-19 39 views
1

我想在我的数据分析中使用R.3.3.1中的BradleyTerry2包来包含特定于比赛的变量(我也尝试使用R.2.11.1与旧版本进行比较BradleyTerry2)。我面临的问题是我的预测变量没有适当考虑。下面的例子显示了我的问题,使用CEMS数据来说明我的观点。BradleyTerry2包中预测变量的问题

CEMS.BTmodel_01 <- BTm(outcome = cbind(win1.adj, win2.adj), 
     player1 = school1, 
     player2 = school2, 
     formula = ~ .. + WOR[student] * LAT[..], 
     refcat = "Stockholm", 
     data = CEMS) 
    summary(CEMS.BTmodel_01) 

有了这个模型,我们得到一个AIC = 5837.4,估计到LAT的相互作用[..] * WOR [学生] = 0.85771

现在,如果我添加了一个新的学校(图卢兹,LAT = 1)在列表顶部

Toulouse <- c(1,0,0,0,0,0,0) 
    Barcelona <- c(0,1,0,0,0,0,0) 
    London <- c(0,0,1,0,0,0,0) 
    Milano <- c(0,0,0,1,0,0,0) 
    Paris <- c(0,0,0,0,1,0,0) 
    St.Gallen <- c(0,0,0,0,0,1,0) 
    Stockholm <- c(0,0,0,0,0,0,1) 
    LAT <- c(1,1,0,1,1,0,0) 
    schools <- data.frame(Toulouse, Barcelona, London, Milano, Paris, St.Gallen, Stockholm, LAT) 
    rownames(schools) <- c("Toulouse", "Barcelona", "London", "Milano", "Paris", "St.Gallen", "Stockholm") 
    CEMS$schools <- schools 

我希望从分析得到同样的结果,因为新的学校没有在数据集中出现。但我实际上得到了AIC = 5855.8,互动LAT []] WOR [学生] = 0.13199

玩弄数据,它看起来我的预测变量名称(这里学校的名称)是没有适当考虑并与我的比较数据(这里是来自欧洲学生的配对比较)匹配。相反,这是他们的顺序。

我做错了什么?

回答

0

CEMS$schools的各行应匹配school1school2因子的水平(的CEMS$schools的rownames不实际代码中使用;在第一行应匹配的第一级等)。所以,你需要更新的school1school2水平:

CEMS$preferences <- 
within(CEMS$preferences, { 
    school1 <- factor(school1, rownames(CEMS$schools)) 
    school2 <- factor(school2, rownames(CEMS$schools)) 
    }) 

CEMS.BTmodel_02 <- BTm(outcome = cbind(win1.adj, win2.adj), 
        player1 = school1, 
        player2 = school2, 
        formula = ~ .. + WOR[student] * LAT[..], 
        refcat = "Stockholm", 
        data = CEMS) 

现在预期的模型是一样的:

> CEMS.BTmodel_01 
Bradley Terry model fit by glm.fit 

Call: BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS) 

Coefficients [contrasts: ..=contr.treatment ]: 
     ..Barcelona     ..London     ..Milano 
      0.5044     1.6037     0.3538 
      ..Paris    ..St.Gallen   WOR[student]yes 
      0.8741     0.5268      NA 
      LAT[..] WOR[student]yes:LAT[..] 
       NA     0.8577 
Degrees of Freedom: 4454 Total (i.e. Null); 4448 Residual 
    (91 observations deleted due to missingness) 
Null Deviance:  5499 
Residual Deviance: 4912  AIC: 5837 

> CEMS.BTmodel_02 
Bradley Terry model fit by glm.fit 

Call: BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS) 

Coefficients [contrasts: ..=contr.treatment ]: 
     ..Toulouse    ..Barcelona     ..London 
       NA     0.5044     1.6037 
      ..Milano     ..Paris    ..St.Gallen 
      0.3538     0.8741     0.5268 
    WOR[student]yes     LAT[..] WOR[student]yes:LAT[..] 
       NA      NA     0.8577 
Degrees of Freedom: 4454 Total (i.e. Null); 4448 Residual 
    (91 observations deleted due to missingness) 
Null Deviance:  5499 
Residual Deviance: 4912  AIC: 5837 
+0

大,它工作得很好,现在的结果要好得多。 我也意识到,同样的方法也必须应用于其他协变量矩阵(CEMS示例中的“学生”矩阵)。 –