2014-01-24 68 views
5

比方说,我有一个响应变量随着时间的推移而升降。每当响应变量超过阈值时,我们就会有一个新的“试验”。也就是说,如果我添加一列ThresholdTRUE,每当高于某个值时,其中ThresholdTRUE的连续数据点块构成新的试验。根据阈值分组数据?

Time <- seq(1, 10, by = 0.5) 
Response <- abs(sin(Time)) 
Threshold <- Response > 0.6 
data <- data.frame(Time, Response, Threshold) 

鉴于TimeResponseThreshold,我怎么可能去补充说,对每个组的TRUE阈值的新值Trial因素?事情是这样的:

Time Response Threshold Trial 
1 1.0 0.84147098  TRUE A 
2 1.5 0.99749499  TRUE A 
3 2.0 0.90929743  TRUE A 
4 2.5 0.59847214  FALSE NA 
5 3.0 0.14112001  FALSE NA 
6 3.5 0.35078323  FALSE NA 
7 4.0 0.75680250  TRUE B 
8 4.5 0.97753012  TRUE B 
9 5.0 0.95892427  TRUE B 
10 5.5 0.70554033  TRUE B 
11 6.0 0.27941550  FALSE NA 
12 6.5 0.21511999  FALSE NA 
13 7.0 0.65698660  TRUE C 
14 7.5 0.93799998  TRUE C 
15 8.0 0.98935825  TRUE C 
16 8.5 0.79848711  TRUE C 
17 9.0 0.41211849  FALSE NA 
18 9.5 0.07515112  FALSE NA 
19 10.0 0.54402111  FALSE NA 

回答

3
data$Trial <- factor(
    ifelse(data$Threshold, cumsum(!data$Threshold), NA), labels = c("A", "B", "C") 
) 

## Time Response Threshold Trial 
## 1 1.0 0.84147098  TRUE  A 
## 2 1.5 0.99749499  TRUE  A 
## 3 2.0 0.90929743  TRUE  A 
## 4 2.5 0.59847214  FALSE <NA> 
## 5 3.0 0.14112001  FALSE <NA> 
## 6 3.5 0.35078323  FALSE <NA> 
## 7 4.0 0.75680250  TRUE  B 
## 8 4.5 0.97753012  TRUE  B 
## 9 5.0 0.95892427  TRUE  B 
## 10 5.5 0.70554033  TRUE  B 
## 11 6.0 0.27941550  FALSE <NA> 
## 12 6.5 0.21511999  FALSE <NA> 
## 13 7.0 0.65698660  TRUE  C 
## 14 7.5 0.93799998  TRUE  C 
## 15 8.0 0.98935825  TRUE  C 
## 16 8.5 0.79848711  TRUE  C 
## 17 9.0 0.41211849  FALSE <NA> 
## 18 9.5 0.07515112  FALSE <NA> 
## 19 10.0 0.54402111  FALSE <NA> 
2

另一种可能使用rle

r <- with(data, rle(Threshold)) 
len <- with(r, lengths[values]) 
n <- length(len) 

trial <- rep(x = LETTERS[1:n], times = len) 

data$Trial[data$Threshold] <- trial 

data 
+0

+1。这比杰克的答案要快,特别是在数据变大的时候。它可以进一步优化。看到这里:https://gist.github.com/mrdwab/8601445 – A5C1D2H2I1M1N2O1R2T1

+0

@AnandaMahto,谢谢你的意见和改进建议! – Henrik