2013-08-02 309 views
45

我仍然在学习如何将SAS代码翻译成R,并收到警告。我需要了解我犯的错误。我想要做的是创建一个总结和区分人口三大地位的变量:大陆,海外,外国人。 我有2个变量数据库:嵌套ifelse语句

  • ID国籍:idnat(法国,外国人),

如果idnat是法国人,则:

  • ID出生地:idbp(大陆,殖民地,海外)

我想总结一下信息米idnatidbp到一个新的变量,名为idnat2

  • 状态:K(内地,海外,外国人)

所有论文变量使用 “字符类型”。

结果预计将在列idnat2:

idnat  idbp idnat2 
1 french mainland mainland 
2 french colony overseas 
3 french overseas overseas 
4 foreign foreign foreign 

这里是我的SAS代码,我想中的R翻译:

if idnat = "french" then do; 
    if idbp in ("overseas","colony") then idnat2 = "overseas"; 
    else idnat2 = "mainland"; 
end; 
else idnat2 = "foreigner"; 
run; 

这里是我的R中的尝试:

if(idnat=="french"){ 
    idnat2 <- "mainland" 
} else if(idbp=="overseas"|idbp=="colony"){ 
    idnat2 <- "overseas" 
} else { 
    idnat2 <- "foreigner" 
} 

我收到这样的警告:

Warning message: 
In if (idnat=="french") { : 
    the condition has length > 1 and only the first element will be used 

有人建议我使用“嵌套ifelse”,而不是它的容易,但得到更多的警告:

idnat2 <- ifelse (idnat=="french", "mainland", 
     ifelse (idbp=="overseas"|idbp=="colony", "overseas") 
    ) 
      else (idnat2 <- "foreigner") 

按照警告消息长度大于1,所以只有第一个括号之间的内容才会被考虑。对不起,但我不明白这个长度与这里有什么关系?任何人都知道我错在哪里?

+3

你不应该混'ifelse'和'else'。 – Roland

+0

@ Roland你说得对,谢谢你的建议,我只是把结果。我想要的只是在列idnat2,如果它清楚。 @KarlForner谢谢你,这正是我想用简单的例子做的事情,但是我真的很苦恼于“R”。我试图在SPSS上做同样的事情,它更简单。 – balour

+0

我的观点是,SO不是学习语言的替代品。有很多书籍,教程......当你被困住时,你应该在这里发布,并且你已经使用了所有其他资源。最好。 –

回答

10

尝试类似如下:

# some sample data 
idnat <- sample(c("french","foreigner"),100,TRUE) 
idbp <- rep(NA,100) 
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE) 

# recoding 
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland", 
       ifelse(idbp %in% c("overseas","colony"),"overseas", 
        "foreigner")) 
cbind(idnat,idbp,out) # check result 

你的困惑来自SAS和R如何处理的if-else结构。在R中,ifelse未被矢量化,这意味着它们检查单个条件是否为真(即,if("french"=="french")有效)并且不能处理多个逻辑(即,if(c("french","foreigner")=="french")不起作用),并且R给出了您收到的警告。

相比之下,ifelse是向量化的,所以它可以将你的向量(aka输入变量)和每个元素的逻辑条件进行测试,就像你在SAS中习惯的那样。另一种解决方法是使用ifelse语句来构建一个循环(正如您在这里所做的那样),但矢量化的ifelse方法将更有效,并且通常涉及更少的代码。

+0

你好,R中的IF和ELSE都没有矢量化,所以我得到了关于长度> 1的警告,并且只记录了第一个TRUE参数。我会尝试一下关于IFELSE的提示,尽管Tomas greif也是一种效率更高的方法。 – balour

77

如果您正在使用任何电子表格应用程序有一个基本的功能if()语法:

if(<condition>, <yes>, <no>) 

语法完全为R中ifelse()相同:

ifelse(<condition>, <yes>, <no>) 

if()中唯一的区别电子表格应用程序是R ifelse()矢量化(将矢量作为输入并将输出返回给矢量)。考虑以下比较电子表格应用程序中的公式和R中的一个示例,其中我们想比较a> b,如果是,则返回1,否则返回0。

在电子表格:

A B C 
1 3 1 =if(A1 > B1, 1, 0) 
2 2 2 =if(A2 > B2, 1, 0) 
3 1 3 =if(A3 > B3, 1, 0) 

在R:

> a <- 3:1; b <- 1:3 
> ifelse(a > b, 1, 0) 
[1] 1 0 0 

ifelse()可以被嵌套在许多方面:

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>)) 

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>) 

ifelse(<condition>, 
     ifelse(<condition>, <yes>, <no>), 
     ifelse(<condition>, <yes>, <no>) 
    ) 

ifelse(<condition>, <yes>, 
     ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, <no>) 
      ) 
     ) 

要计算列idnat2您可以:

df <- read.table(header=TRUE, text=" 
idnat idbp idnat2 
french mainland mainland 
french colony overseas 
french overseas overseas 
foreign foreign foreign" 
) 

with(df, 
    ifelse(idnat=="french", 
     ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign") 
    ) 

R Documentation

什么是the condition has length > 1 and only the first element will be used?让我们看看:

> # What is first condition really testing? 
> with(df, idnat=="french") 
[1] TRUE TRUE TRUE FALSE 
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested. 
> # Vector of logical values is returned (has the same length as idnat) 
> df$idnat2 <- with(df, 
+ if(idnat=="french"){ 
+ idnat2 <- "xxx" 
+ } 
+ ) 
Warning message: 
In if (idnat == "french") { : 
    the condition has length > 1 and only the first element will be used 
> # Note that the first element of comparison is TRUE and that's whay we get: 
> df 
    idnat  idbp idnat2 
1 french mainland xxx 
2 french colony xxx 
3 french overseas xxx 
4 foreign foreign xxx 
> # There is really logic in it, you have to get used to it 

我还可以使用if()吗?是的,你可以,但语法是不是很爽:)

test <- function(x) { 
    if(x=="french") { 
    "french" 
    } else{ 
    "not really french" 
    } 
} 

apply(array(df[["idnat"]]),MARGIN=1, FUN=test) 

如果你熟悉SQL,您还可以在sqldfpackage使用CASEstatement

6

如果没有ififelse,您可以创建矢量idnat2

功能replace可用于与"overseas"取代的"colony"所有出现:

idnat2 <- replace(idbp, idbp == "colony", "overseas") 
+1

或多或少相同:'df $ idnat2 < - df $ idbp; df $ idnat2 [df $ idnat =='colony'] < - 'overseas'' – Jaap

1

随着data.table,该解决方案是:

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
     ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland"))] 

ifelse被量化。 if-else不是。在这里,DT是:

idnat  idbp 
1 french mainland 
2 french colony 
3 french overseas 
4 foreign foreign 

这给:

idnat  idbp idnat2 
1: french mainland mainland 
2: french colony overseas 
3: french overseas overseas 
4: foreign foreign foreign 
+0

更好的方法是:'DT [,idnat2:= idbp] [idbp%in%c('colony','overseas '),idnat2:='overseas']' – Jaap

+2

甚至更​​好:'DT [,idnat2:= idbp] [idbp =='colony',idnat2:='overseas']' – Jaap

+0

另一个'data.table'加入一个查找表:'DT [lookup,on =。(idnat,idbp),idnat2:= i.idnat2] []' – Uwe

3

使用与dplyr和sqldf包SQL CASE语句:

数据

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
"french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
"idbp"), class = "data.frame", row.names = c(NA, -4L)) 

sqldf

library(sqldf) 
sqldf("SELECT idnat, idbp, 
     CASE 
      WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
      ELSE idbp 
     END AS idnat2 
     FROM df") 

dplyr

library(dplyr) 
df %>% 
mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", 
          .$idbp %in% c("colony", "overseas") ~ "overseas", 
         TRUE ~ "foreign")) 

输出

idnat  idbp idnat2 
1 french mainland mainland 
2 french colony overseas 
3 french overseas overseas 
4 foreign foreign foreign 
5

如果数据集包含许多行可能是更有效的使用data.table查找表加入,而不是嵌套ifelse()

下面提供

lookup 
 idnat  idbp idnat2 
1: french mainland mainland 
2: french colony overseas 
3: french overseas overseas 
4: foreign foreign foreign 

的查找表和试样数据集合

library(data.table) 
n_row <- 10L 
set.seed(1L) 
DT <- data.table(idnat = "french", 
       idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE)) 
DT[idbp == "foreign", idnat := "foreign"][] 
 idnat  idbp 
1: french colony 
2: french colony 
3: french overseas 
4: foreign foreign 
5: french mainland 
6: foreign foreign 
7: foreign foreign 
8: french overseas 
9: french overseas 
10: french mainland 

然后同时加入我们可以做一个更新:

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][] 
 idnat  idbp idnat2 
1: french colony overseas 
2: french colony overseas 
3: french overseas overseas 
4: foreign foreign foreign 
5: french mainland mainland 
6: foreign foreign foreign 
7: foreign foreign foreign 
8: french overseas overseas 
9: french overseas overseas 
10: french mainland mainland