2016-01-12 136 views
0

我有一个基本上我想清理的电子邮件列表。我想说明的是,如果'@'字符不在特定的电子邮件中,我想删除该电子邮件 - 这样一个输入如'mywebsite.com'将被删除。R部分字符串匹配 - 排除

我的代码如下:

email_clean <- function(email, invalid = NA){ 
    email <- trimws(email)               # Removes whitespace 
    email[(nchar(email) %in% c(1,2)) ] <- invalid         # Removes emails with 1 or 2 character length 
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",   # List of bad emails - modify to the 
        "\\@noemail.com", "\\@test.com",         # specifications of the request 

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")   # Deletes names matching bad email 
    email <-gsub(pattern, invalid, sapply(email,as.character)) 
    unname(email) 
    } 

    ## Define vector of SSN from origianl csv column 
    Cleaned_Email <- email_clean(my_data$Email) 


    ## Binds cleaned phone to csv 
    my_data<-cbind(my_data,Cleaned_Email) 

谢谢!

+2

什么是你的问题? – nrussell

回答

3
email_clean <- function(email, invalid = NA){ 
    email <- trimws(email)               # Removes whitespace 
    email[(nchar(email) %in% c(1,2)) ] <- invalid         # Removes emails with 1 or 2 character length 
    email[!grepl("@", email)] <- invalid # <------------------ New line added here ------------ 
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",   # List of bad emails - modify to the 
        "\\@noemail.com", "\\@test.com",         # specifications of the request 

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")   # Deletes names matching bad email 
    email <-gsub(pattern, invalid, sapply(email,as.character)) 
    unname(email) 
    } 
+0

真是太好了,谢谢皮埃尔! – Maddie

0

电子邮件列中尝试使用此方法排除my_data没有任何行“@”符号:

my_data <- my_data[grep('@', my_data$Email), ] 
+1

我不认为grep的作品,因为我在技术上寻找电子邮件的矢量,除非我失去了一些东西。 – Maddie

+0

您仍然可以使用grep:Email [grep('@',Email)]。 grep方法只是返回发生匹配的索引向量。您可以基于返回的矢量对数据框或矢量进行子集分类。 – Gopala