实际上,ROCR
是这项任务的矫枉过正。在ROCR
的函数performance
返回其输入中存在的每个分数的性能指标。所以,理论上可以做到以下几点:
library(ROCR)
set.seed(123)
N <- 1000
POSITIVE_CASE <- 'case A'
NEGATIVE_CASE <- 'case B'
CUTOFF <- 0.456
scores <- rnorm(n=N)
labels <- ifelse(runif(N) > 0.5, POSITIVE_CASE, NEGATIVE_CASE)
pred <- prediction(scores, labels)
perf <- performance(pred, 'sens', 'spec')
此时perf
包含了很多有用的信息:
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
[email protected] x.name : chr "Specificity"
[email protected] y.name : chr "Sensitivity"
[email protected] alpha.name : chr "Cutoff"
[email protected] x.values :List of 1
.. ..$ : num [1:1001] 1 1 0.998 0.996 0.996 ...
[email protected] y.values :List of 1
.. ..$ : num [1:1001] 0 0.00202 0.00202 0.00202 0.00405 ...
[email protected] alpha.values:List of 1
.. ..$ : num [1:1001] Inf 3.24 2.69 2.68 2.58 ...
现在你可以在[email protected]
为您的积分截止搜索并找到相应的灵敏度和特异性值。如果你没有找到[email protected]
确切的临界值,你就必须做一些插值:
ix <- which.min(abs([email protected][[1]] - CUTOFF)) #good enough in our case
sensitivity <- [email protected][[1]][ix] #note the order of arguments to `perfomance` and of x and y in `perf`
specificity <- [email protected][[1]][ix]
它给你:
> sensitivity
[1] 0.3319838
> specificity
[1] 0.6956522
但还有一个更简单和更快的方式:你的标签字符串只是转换为二进制矢量和直接计算指标:
binary.labels <- labels == POSITIVE_CASE
tp <- sum((scores > threshold) & binary.labels)
sensitivity <- tp/sum(binary.labels)
tn <- sum((scores <= threshold) & (! binary.labels))
specificity <- tn/sum(!binary.labels)
,让你:
> sensitivity
[1] 0.3319838
> specificity
[1] 0.6956522
这种方法完全不符合使用预测风险和效用函数的最佳决策。 –