2015-07-01 56 views
6

我想创建前3列('group','animal'和'full')的最后一列('desired_result')。以下是可重复使用的示例的代码。从列表中删除特定于行的项目

library(data.table) 
data = data.table(group = c(1,1,1,2,2,2), animal = c('cat', 'dog', 'pig', 'giraffe', 'lion', 'tiger'), desired_result = c('dog, pig', 'cat, pig', 'cat, dog', 'lion, tiger', 'giraffe, tiger', 'giraffe, lion')) 
data[, full := list(list(animal)), by = 'group'] 
data = data[, .(group, animal, full, desired_result)] 

data 
    group animal    full desired_result 
1:  1  cat   cat,dog,pig  dog, pig 
2:  1  dog   cat,dog,pig  cat, pig 
3:  1  pig   cat,dog,pig  cat, dog 
4:  2 giraffe giraffe,lion,tiger lion, tiger 
5:  2 lion giraffe,lion,tiger giraffe, tiger 
6:  2 tiger giraffe,lion,tiger giraffe, lion 

基本上,我想修改'full',所以它不包含相应的'动物'。我已经尝试过使用这些列的列表和字符版本的各种lapply命令,但无法解决这个问题。

回答

3

这里有一个可能的方法

data[, desired_result := { 
     temp <- unique(unlist(full)) 
     toString(temp[-match(animal, temp)]) 
     }, by = .(group, animal)] 
data 
# group animal    full desired_result 
# 1:  1  cat  cat,dog,pig  dog, pig 
# 2:  1  dog  cat,dog,pig  cat, pig 
# 3:  1  pig  cat,dog,pig  cat, dog 
# 4:  2 giraffe giraffe,lion,tiger lion, tiger 
# 5:  2 lion giraffe,lion,tiger giraffe, tiger 
# 6:  2 tiger giraffe,lion,tiger giraffe, lion 
3

另一种选择:

data[, desired := .(Map(setdiff, list(animal), as.list(animal))), by = group] 

#or if starting from full 
data[, desired := .(Map(setdiff, full, animal))] 

(循环魔法使的第一个版本的工作)

+0

'dplyr':'library(dplyr); data%>%mutate(desired = Map(setdiff,full,animal))' –

+0

这将返回一个列表而不是字符向量(按照OP的期望输出)。 –

+1

我阅读OP,因为他们不关心他们是否得到一个列表或一个字符串,并且转换是微不足道的 – eddi

1

我找到了一种方法,以及!

通过将'动物'转换为列表,我可以使用mapply。

data$animal = strsplit(data$animal, ' ') 
data$check = mapply(function(x, y) {list(x[x != y]) }, data$full, data$animal) 

data 
group animal    full desired_result   check 
1:  1  cat  cat,dog,pig  dog, pig  dog,pig 
2:  1  dog  cat,dog,pig  cat, pig  cat,pig 
3:  1  pig  cat,dog,pig  cat, dog  cat,dog 
4:  2 giraffe giraffe,lion,tiger lion, tiger lion,tiger 
5:  2 lion giraffe,lion,tiger giraffe, tiger giraffe,tiger 
6:  2 tiger giraffe,lion,tiger giraffe, lion giraffe,lion 
+0

你的方法将返回一个列表而不是一个字符向量(根据你想要的输出) –

+0

好吧,这将不得不被转换和清洗,如果有必要。 – DataBandit