剔除少于两个因子的变量

我的数据框中的变量包含字符观察值（不确定这是否是正确的方式来表示这一点，本质上，当我拉起结构时数据被列为“chr”）。剔除少于两个因子的变量

我想先把所有东西都转换成因子，然后检查一下因子水平。一旦它们成为因素，我只想继续使用具有两个或更多级别的数据框中的变量。

这是我的想法。我知道for循环在R中是一种禁忌，但我很新，对我来说使用它是有道理的。

x = as.character(c("Not Sampled", "Not Sampled", "Y", "N")) 
y = as.character(c("Not Sampled", "Not Sampled", "Not Sampled", "Not Sampled")) 
z = as.character(c("Y", "N", "Not Sampled", "Y")) 
df = data.frame(x, y, z) 

for i in df: 
    df$Response = as.factor(df[,i]) #create new variable in dataframe 
    df$Response = [email protected][sapply .... #where I think I can separate out the variables I want and the variables I don't want 

    m1 = lm(response ~ 1) #next part where I want only the selected variables

我知道解决方案可能要复杂得多，但这是我刚刚起步的尝试。

来源

2016-03-15 userfriendly

library(dplyr) 

df <- df %>% lapply(factor) %>% data.frame() 
df[ , sapply(df, n_distinct) >= 2]

来源

2016-03-15 19:57:20

哇，这是一个真棒小费，谢谢！ – userfriendly

你不需要dplyr这个lapply方法。（如果你想使用dplyr，你可以使用'mutate_each'） –

默认data.frame方法将字符串转换为因素，所以额外的转换是没有必要在这种情况下。 lapply比较好，因为如果长度相同，sapply将尽量简化矩阵的返回值。

df = data.frame(x, y, z) 

## Already factors, use sapply(df, typeof) to see underlying representation 
sapply(df, class) 
#  x  y  z 
# "factor" "factor" "factor" 

## These are the indicies with > 2 levels 
lengths(lapply(df, levels)) > 2 
# x  y  z 
# TRUE FALSE TRUE 

## Extract only those columns 
df[lengths(lapply(df, levels)) > 2]

来源

2016-03-15 20:02:37 jenesaisquoi

这看起来好像对我有帮助。我试图复制并粘贴它来测试它，但我不确定“长度”是不同的函数还是它是基本函数“长度”的拼写错误。我99％肯定这是后者，但是为了后人的缘故，我想澄清一下。 – userfriendly

df[, sapply(df, function(x) length(levels(x)) >= 2)]

来源

2016-03-15 20:07:25 TheRimalaya

剔除少于两个因子的变量

回答

相关问题