2013-07-16 67 views
0

我想给first.date基于first.date的水果=='苹果'每个唯一的ID相同的列值。给每个id相同的列值R

这是我有:

names  dates fruit first.date 
1 john 2010-07-01 kiwi  <NA> 
2 john 2010-09-01 apple 2010-09-01 
3 john 2010-11-01 banana  <NA> 
4 john 2010-12-01 orange  <NA> 
5 john 2011-01-01 apple 2010-09-01 
6 mary 2010-05-01 orange  <NA> 
7 mary 2010-07-01 apple 2010-07-01 
8 mary 2010-07-01 orange  <NA> 
9 mary 2010-09-01 apple 2010-07-01 
10 mary 2010-11-01 apple 2010-07-01 

这就是我想要的:

names  dates fruit first.date 
1 john 2010-07-01 kiwi 2010-09-01 
2 john 2010-09-01 apple 2010-09-01 
3 john 2010-11-01 banana 2010-09-01 
4 john 2010-12-01 orange 2010-09-01 
5 john 2011-01-01 apple 2010-09-01 
6 mary 2010-05-01 orange 2010-07-01 
7 mary 2010-07-01 apple 2010-07-01 
8 mary 2010-07-01 orange 2010-07-01 
9 mary 2010-09-01 apple 2010-07-01 
10 mary 2010-11-01 apple 2010-07-01 

这是我的灾难性的尝试:

getdates$first.date[is.na]<-getdates[getdates$first.date & getdates$fruit=='apple',] 

预先感谢您

重复性DF

names<-as.character(c("john", "john", "john", "john", "john", "mary", "mary","mary","mary","mary")) 
dates<-as.Date(c("2010-07-01", "2010-09-01", "2010-11-01", "2010-12-01", "2011-01-01", "2010-05-01", "2010-07-01", "2010-07-01", "2010-09-01", "2010-11-01")) 
fruit<-as.character(c("kiwi","apple","banana","orange","apple","orange","apple","orange", "apple", "apple")) 
first.date<-as.Date(c(NA, "2010-09-01",NA,NA, "2010-09-01", NA, "2010-07-01", NA, "2010-07-01","2010-07-01")) 
getdates<-data.frame(names,dates,fruit, first.date) 
+1

请格式化你的问题正确。一个人什么也看不懂! – asb

+0

道歉 - 我的不好 – user2363642

+0

好多了!现在让我看看。 :D – asb

回答

3

目前还不清楚你想要当有对first.dateapple(给定名称)重复的条目做什么,这将只取前一个:

library(data.table) 
dt = data.table(getdates) 

dt[, first.date := first.date[fruit == 'apple'][1], by = names] 
dt 
# names  dates fruit first.date 
# 1: john 2010-07-01 kiwi 2010-09-01 
# 2: john 2010-09-01 apple 2010-09-01 
# 3: john 2010-11-01 banana 2010-09-01 
# 4: john 2010-12-01 orange 2010-09-01 
# 5: john 2011-01-01 apple 2010-09-01 
# 6: mary 2010-05-01 orange 2010-07-01 
# 7: mary 2010-07-01 apple 2010-07-01 
# 8: mary 2010-07-01 orange 2010-07-01 
# 9: mary 2010-09-01 apple 2010-07-01 
#10: mary 2010-11-01 apple 2010-07-01 
+0

嗨eddi - first.date值是一个人第一次得到一个苹果,副本确保这个第一个日期不会被他们得到一个苹果的下一个日期覆盖。你的代码似乎工作得很好,因为它会做我想做的事情:用每个ID的苹果的第一个日期填写一个列。谢谢! – user2363642

+2

如果每个组有多个组和大量条目,由于每个组的矢量扫描,此*可能*性能较差。也许这是更好的:'DT < - data.table(getdates); setkey(DT,名字,水果); dd < - DT [J(unique(names),“apple”),mult =“first”] $ dates; DT [,first.date:= dd [.GRP],by = names]'。也就是说,如果OP不介意对行进行重新排序。 – Arun

+0

嗨阿伦,我将有大约6000个人,每个人至少有18行......所以你的方法可能有好处。将尝试并报告。谢谢 – user2363642