2017-06-12 111 views
0

我试图为子集中的值匹配条件的因子lepsp的空白级别指定名称。数据的例子包括:根据数据框子集内的值匹配重命名因子的级别

df<- 
    plantfam  lepfam   lepsp    lepcn 
    Asteraceae  Geometridae Eois sp   green/spikes 
    Asteraceae  Erebidae  Anoba sp   green/nospikes      
    Asteraceae  Erebidae       green/nospikes    
    Melastomaceae Noctuidae  Balsinae sp    
    Poaceae   Erebidae  Deinopa sp   black/orangespots 
    Poaceae   Erebidae       black/orangespots 
    Poaceae   Erebidae  Cocytia sp   black/yellowspots 
    Poaceae           black/yellowspots 

下面是以下数据框代码:

df<-data.frame(plantfam= c("Asteraceae","Asteraceae","Asteraceae", 
"Melastomaceae","Poaceae","Poaceae","Poaceae","Poaceae"), lepfam= 
c("Geometridae", "Erebidae","Erebidae", 
"Noctuidae","Erebidae","Erebidae","Erebidae",""), lepsp= c("Eois sp", 
"Anoba sp", "", "Balsinae sp", "Deinopa sp", "", "Cocytia sp", ""), 
lepcn= c("green/spikes","green/nospikes", "green/nospikes","", 
"black/orangespots", "black/orangespots", "black/yellowspots", 
"black/yellowspots")) 

如果lepsp是空白的,但有一个lepcnlepcn比赛另一个lepsp在同一plantfam为食, lepsp的空白将被赋予lepsp这些条件匹配的名称。因此,每个lepfam子集饲喂相同的plantfam与相同lepcn将被指定为相同的名称。

output<- 
    plantfam  lepfam   lepsp    lepcn 
    Asteraceae  Geometridae Eois sp   green/spikes 
    Asteraceae  Erebidae  Anoba sp   green/nospikes      
    Asteraceae  Erebidae  Anoba sp   green/nospikes    
    Melastomaceae Noctuidae  Balsinae sp    
    Poaceae   Erebidae  Deinopa sp  black/orangespots 
    Poaceae   Erebidae  Deinopa sp  black/orangespots 
    Poaceae   Erebidae  Cocytia sp  black/yellowspots 
    Poaceae      Cocytia sp  black/yellowspots 

我曾尝试没有成功以下的变化: 与检查组合的益处https://stackoverflow.com/a/44479195/8061255

+0

您能否提供一个数据集样本,以便我们能够生成可重现的解决方案? –

+0

我在印象之下,上面是数据集的一个例子。我能提供什么可以进一步帮助?感谢您的时间。 – Danielle

+0

我已经为示例数据框添加了代码,这可能是您要求的内容。再次感谢您的帮助。 – Danielle

回答

0

直截了当基础R进行重命名。在本质上,你得到的plantfam/lepfam/lepcn组合的一个单独的列表,你与原始数据集将其合并在:

读取数据,并作出预期确定的格式:

df<- read.csv(text = 
'plantfam,lepfam,lepsp,lepcn 
Asteraceae,Geometridae,Eois sp,green/spikes 
Asteraceae,Erebidae,Anoba sp,green/nospikes 
Asteraceae,Erebidae,NA,green/nospikes 
Melastomaceae,Noctuidae,Balsinae sp,NA 
Poaceae,Erebidae,Deinopa sp,black/orangespots 
Poaceae,Erebidae,NA,black/orangespots 
Poaceae,Erebidae,NA,balck/yellowspots') 

# assumes blanks are NA 
# if blanks are actually empty strings "" then turn those into NA's 

# make sure everything is a character, not a factor 
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F) 

解决方案:

# get a unique list of all combinations that don't have missing data 
dflookup <- unique(na.omit(df)) 

# inspect combinations to be renamed, there should be no duplicate plantfam/lepfam/lepcn combinations 
dflookup 

# use the lookup to merge in all known names 
newdf <- merge(df,dflookup,by = c('plantfam','lepfam','lepcn'),all.x = T,suffixes = c('old','new')) 

# use original lepsp when new lepsp is NA 
newdf$lepsp <- ifelse(is.na(newdf$lepspnew),newdf$lepspold,newdf$lepspnew) 

# remove unneeded columns 
newdf$lepspold <- newdf$lepspnew <- NULL 

# turn back into factors if desired 
newdf <- as.data.frame(apply(newdf,2,as.factor)) 
相关问题