Coxph预测不

下午好匹配系数，Coxph预测不

我可以张贴重复性代码，并肯定会，如果每个人都同意，什么是错的，但现在我觉得我的问题很简单，有人会指出我的正确的道路。

我在一个数据集的工作是这样的：

created_as_free_user  t  c 
       <fctr> <int> <int> 
1     true 36  0 
2     true 36  0 
3     true  0  1 
4     true 28  0 
5     true  9  0 
6     true  0  1 
7     true 13  0 
8     true 19  0 
9     true  9  0 
10     true 16  0

我装了Cox回归模型是这样的：

fit_train = coxph(Surv(time = t,event = c) ~ created_as_free_user ,data = teste) 
summary(fit_train)

并得到：

Call: 
coxph(formula = Surv(time = t, event = c) ~ created_as_free_user, 
    data = teste) 

    n= 9000, number of events= 1233 

          coef exp(coef) se(coef)  z Pr(>|z|)  
created_as_free_usertrue -0.7205 0.4865 0.1628 -4.426 9.59e-06 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

         exp(coef) exp(-coef) lower .95 upper .95 
created_as_free_usertrue 0.4865  2.055 0.3536 0.6693 

Concordance= 0.511 (se = 0.002) 
Rsquare= 0.002 (max possible= 0.908) 
Likelihood ratio test= 15.81 on 1 df, p=7e-05 
Wald test   = 19.59 on 1 df, p=9.589e-06 
Score (logrank) test = 20.45 on 1 df, p=6.109e-06

到目前为止好。下一步：预测新数据的结果。我了解预测c.cfph可以给我的不同类型的预测（或者至少我认为我可以）。让我们使用类型= “LP”：

head(predict(fit_train,validacao,type = "lp"),n=20)

并获得：

 1   2   3   4   5   6   7   8   9   10 
-0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 
     11   12   13   14   15   16   17   18   19   20 
-0.01208854 -0.01208854 0.70842049 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854

确定。但是，当我看数据，我试图估计：

# A tibble: 9,000 × 3 
    created_as_free_user  t  c 
       <fctr> <int> <int> 
1     true 20  0 
2     true 12  0 
3     true  0  1 
4     true 10  0 
5     true 51  0 
6     true 36  0 
7     true 44  0 
8     true  0  1 
9     true 27  0 
10     true  6  0 
# ... with 8,990 more rows

这让我迷惑....

类型=“LP”是不是想给你线性预测？对于这个数据，我试图估计，因为created_as_free_user变量等于true，我是否错误地期望type =“lp”预测是精确-0.7205（上述模型的系数）？ -0.01208854是从哪里来的？我怀疑这是某种规模的情况，但无法在网上找到答案。

我的最终目标是由预测类型=“期望”给出的h（t），但我并不是很喜欢使用它，因为它使用了我不完全理解的这个值。

非常感谢

来源

2017-03-16 Rafael Meirelles

在?predict.coxph细节部分写着：

Cox模型是相对风险模型; “线性预测因子”，“风险”和“术语”的预测都与它们来自的样本相关。默认情况下，这些参数值分别是中的平均协变量。

为了说明这意味着什么，我们可以看一个简单的例子。一些假的数据：

test1 <- list(time=c(4,3,1,1,1), 
      status=c(1,1,1,0,0), 
      x=c(0,2,1,1,0))

我们拟合模型和视图预测：

fit <- coxph(Surv(time, status) ~ x, test1) 
predict(fit, type = "lp") 
# [1] -0.6976630 1.0464945 0.1744157 0.1744157 -0.6976630

的预测是一样的：

(test1$x - mean(test1$x)) * coef(fit) 
# [1] -0.6976630 1.0464945 0.1744157 0.1744157 -0.6976630

（使用这种逻辑和一些算法，我们可以回从结果中可以看出，对于您的created_as_free_user变量，9000个观测值中有8849个“真值”。）

来源

2017-03-16 20:18:14

回答

相关问题