2013-02-01 34 views
2

我有一个样本数据框“z”如下:如何非常确定变量组中的观察值?

deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961 

我喜欢创造相结合的变量种族和性别的新变量“group”。这个新变量唯一地标识daaframe“z”中的观察组。预期的输出是

group 
    1 
    2 
    3 
    4 
    4 
    6 
    5 
    1 
    3 
    4 

我想知道我们如何在R中编码?

+0

您可能正在寻找'interaction()'。 – joran

回答

2

这是诸如此类的事情,我在想:

dat <- read.table(text = "deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961",header = TRUE,sep = "") 

dat$sex <- factor(dat$sex,exclude = NULL) 
dat$race <- factor(dat$race,exclude = NULL) 

with(dat,interaction(sex,race)) 

[1] Female.White Male.White Female.Black Male.Black Male.Black Female.NA NA.Black  Female.White Female.Black 
[10] Male.Black 
Levels: Female.Black Male.Black NA.Black Female.White Male.White NA.White Female.NA Male.NA NA.NA 

它看起来像你想包括港定居,而不是把它们,因此明确factor电话。显然,可以使用as.integer将结果因子转换为整数,但实际的数字不可能按照您指定的顺序排列,因为R会按字母顺序排列事情,而不是它们在数据框中的显示方式。

+0

@ joran:太棒了。非常感谢!! – Metrics

1

你可以使用:

dat <- read.table(text="deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961", header=TRUE) 

library(qdap) 
factor(paste2(dat[, 2:3], ,FALSE)) 

#for numeric: 
as.numeric(factor(paste2(dat[, 2:3], ,FALSE))) 

但作为Joran指出你的数字期望是不一样的R将如何使他们。您需要在factor内部使用levels来根据需要订购等级。

+0

感谢泰勒替代解决方案! – Metrics