有困难中的R

我试图子集以下要求的数据集合获得子集：有困难中的R

ethnicity是xyz
education是本科及以上学历，即Bachelor's Degree或Graduate Degree
我然后想看看符合上述要求的人的收入状况。括号可以是$30,000 - $39,999或$100,000 - $124,999。
最后，作为我的最终结果，我想看看从第三个项目（上面）获得的子集与这些人是否是宗教的列。在数据集中，对应于religious和not religious。

因此，这将是这个样子

income    religious 
$30,000 - $39,999  not religious 
$50,000 - $59,999   religious 
    ....     .... 
    ....     ....

保持头脑列出的那些满足条件1和2

请记住，我是新来编程。我试图弄清楚很长一段时间，并已经挖掘了很多帖子。我似乎无法得到任何工作。我该如何解决？有人请帮忙。

以便不采取从岗位的清晰了，我会寄我已经试过以下（但随时忽略它，因为它可能是垃圾）。

我曾尝试只是为了得到第3步以下的许多变化，但都遭到惨败，而我即将与键盘来砸我的头：

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

我也试过：

race <- df$ethnicity == "xyz" 
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree") 
income_sub <- df$income[ba_ma_phd & race]

我相信income_sub让我到步骤3，但我不知道如何得到它的步骤4

来源

2015-10-04 AlanH

你几乎没有;因为收入是一个矢量而不是数据框，所以你不需要尾随的逗号。即你可以使用'df $ income ['％d'（df $ ethnicity ==“xyz”＆df $ education％in％c（“Bachelor's Degree”，“Graduate Degree”）]'注意，如果种族或教育缺失，你可能希望在你的子集声明中包含非缺失变量（如果你想创建一个子集数据，那么在开始时不要包括'df $ income'，只需使用'df'并保留这个逗号， ...所以'sub_df < - df [其中（df $种族==“xyz”＆df $ education％in％c（“学士学位”，“研究生学位”）]' – user20650

@ user20650那么我该如何获得对应的列'宗教'？ – AlanH

我有点不清楚你想要什么...只是这可能是'表（sub_df $收入，sub_df $宗教）'还是你想要全列'sub_df [c（“收入”， “宗教”）]' – user20650

改变我的评论，因为它有点太长。

首先你的代码，你几乎在那里;因为收入是一个矢量而不是数据框，所以你不需要尾随的逗号。即你可以使用

df$income[which(df$ethnicity == "xyz" & 
     df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
# note no comma after the closing bracket

如果你想创建一个子集化的数据，然后不包括df$income在一开始，就用df并保持逗号这段时间。这会子集数据，但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" & 
     df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

为了再看看income水平的子集数据，可以使用table

table(sub_df$income)

您可以再次使用table检查的次数通过religious状态观察每个income。

table(sub_df$income, sub_df$religious)

如果你只是想使用选择income和religious列，你也可以做到这一点[

sub_df[c("religious", "income")]

来源

2015-10-04 22:57:57 user20650

非常感谢。这花了我很长时间:( – AlanH

你非常欢迎，[R标签信息]（http://stackoverflow.com/tags/r/info）有一些非常有用的链接 – user20650

library(dplyr) 

df %>% 
    filter(ethnicity == "xyz" & 
     education %in% c("Bachelor's Degree", "Graduate Degree")) %>% 
    group_by(religious) %>% 
    summarize(lower_bound = min(income), 
      upper_bound = max(income))

来源

2015-10-04 22:22:10 bramtayl

回答

相关问题