2016-07-16 228 views
-1

我需要更新一个有1000行的问题的电子表格。过滤多个条件的数据帧

我有两个数据集:

DF

CompanyID1  TMC1 
ABC company  QBT 
BCD company  G W TMC 
jb hi fi  QBT 
ABC company  GW TMC 
FB Company  AMEX 
LL company  AMEX 
j k    QBT 
k. l company TP oil 
1 to 1 lts  TP oil 
2 in 1 pty ltd. AMEX 

DF2

DRA CompanyID2   TMC2 Status 
11 2 in 1 pty ltd.  AMEX sent 
12 1 to 1 lts   TP oil produce 
13 BCD company   ACE  sent 
14 k. l company  TP oil sent 
15 jb hi fi    QBT produce 
16 ABC company   QBT sent 
17 j k     QBT sent 
18 FB Company   AMEX sent 
19 facebook pty   QBT sent 
20 2 in 1 pty ltd.  AMEX produce 

我所试图实现df2$CompanyID2首先找到df$CompanyID1值,如果有一个匹配,那么如果其df$TMC1匹配df2$TMC2然后它必须有df2$status=='sent'然后在创建一个新列并返回df2$DRA值;如果df2$status=='produce'然后df$new应该有 '删除'

“ABC公司” 从df2$CompanyID2存在df1$CompanyID1。 ABC公司的df$TMC1匹配df2$TMC2df2$status=='sent'。因此,df$new <- 16

我将非常感谢您的帮助。这将节省大量的时间,我可以用于其他生产目的。由于

dput(DF1)

structure(list(Company.ID1 = structure(c(3L, 4L, 7L, 3L, 5L, 
9L, 6L, 8L, 1L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "FB Company", "j k ", "jb hi fi", 
"k. l company", "LL company"), class = "factor"), TMC1 = structure(c(4L, 
2L, 4L, 3L, 1L, 1L, 4L, 5L, 5L, 1L), .Label = c("AMEX", "G W TMC", 
"GW TMC", "QBT", "TP oil"), class = "factor")), .Names = c("Company.ID1", 
"TMC1"), class = "data.frame", row.names = c(NA, -10L)) 

dput(DF2)

structure(list(DRA = 11:20, Company.ID2 = structure(c(2L, 1L, 
4L, 9L, 8L, 3L, 7L, 6L, 5L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "facebook pty", "FB Company", "j k ", 
"jb hi fi", "k. l company"), class = "factor"), TMC2 = structure(c(2L, 
4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 2L), .Label = c("ACE", "AMEX", 
"QBT", "TP oil"), class = "factor"), Status = structure(c(2L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("produce", "sent" 
), class = "factor")), .Names = c("DRA", "Company.ID2", "TMC2", 
"Status"), class = "data.frame", row.names = c(NA, -10L)) 

for (i in 1:nrow(df1)) 
     { 
     if(df1$Company.ID1[i]==df2$Company.ID2[i] & df1$TMC1[i]==df2$TMC2[i] & df2$Status[i]=='sent') 
       data1$new[i]<- 'sent' 
}else{ data1$new<- 'delete'} 

但是可能有超过1家公司从df1$Company.ID1df2$Company.ID2同名并且它们也可以在不同的行中。

我的预期输出将以下内容:

  1. df1$Company.ID1匹配X公司名称df2$Company.ID2
  2. 如果匹配检查X公司的data1$TMC1比赛df2df2$TMC2
  3. 如果1 & 2为真,则检查其状态的公司x从df2$Status=='sent'
  4. 如果它是TRUE,那么创建一个新的列df1 $ new并获得DRA编号df$DRA,并存储为X公司

感谢

回答

1

这是一个合并和识别方法:

#Merge data on ID and TMC columns 
m <- merge(df2, df, by.x=c("CompanyID2", "TMC2"), 
     by.y=c("CompanyID1", "TMC1")) 

#If "sent" use DRA, if not "delete" 
m$Output <- ifelse(m$Status == "sent", as.character(m$DRA), "delete") 

#Remove unnecessary columns 
m[-(3:4)] 
# CompanyID2 TMC2 Output 
# 1  ABC QBT  16 
# 2  BCD ACE  13 
# 3   jb QBT delete 
+0

@pierre lafortune谢谢 – Chemjong

1

我们可以使用dplyr

library(dplyr) 
inner_join(df2, df1, by = c("CompanyID2" = "CompanyID1", "TMC2" = "TMC1")) %>% 
     mutate(Output = ifelse(Status == "sent", DRA, "delete")) 
1

另外一个使用sqldf

library(sqldf) 
res <- sqldf("select df2.CompanyID2,df2.TMC2, df2.Status, df2.DRA as output 
       from df1 
       join df2 on df1.CompanyID1=df2.CompanyID2 and df1.TMC1=df2.TMC2") 

res[res$Status=="produce",]$output <- "delete" 

     # CompanyID2 TMC2 Status output 
# 1  ABC company QBT sent  16 
# 2  jb hi fi QBT produce delete 
# 3  FB Company AMEX sent  18 
# 4   j k  QBT sent  17 
# 5 k. l company TP oil sent  14 
# 6  1 to 1 lts TP oil produce delete 
# 7 2 in 1 pty ltd. AMEX sent  11 
# 8 2 in 1 pty ltd. AMEX produce delete 
+0

或者最后一行的这种变化:'res [res $ Status ==“produce”,“output”] < - “delete” –