我的数据是:dplyr/dt总结列是否不为空/ NA并粘贴?
Name House Street Apt City Postal Phone
DUMA PAUL 2030 GREEN ROAD DESERT Z0K2K1 999-577-3789
DUNN S GREEN ROAD DESERT Z0K2K1 999-577-3256
FERGUSON BOB GREEN ROAD DESERT Z0K2K1 999-577-3771
FITSCHEN A 3989 GREEN ROAD DESERT Z0K2K1 999-577-3557
BLACK CARY 2079 GREEN ROAD DESERT Z0K2K1 999-577-3779
BLACK RUTH 2079 GREEN ROAD DESERT Z0K2K1 999-577-3779
我想比较名称(动态,数据由众议院排序),如果相等,房子#是平等的,连接具有各自的两个电话号码“OR”和删除行那不是连接起来并串联了名称“和”
我使用:
data <- data %>%
group_by(House, Street, Apt, City, Postal) %>%
summarise(Name = first(paste(Name, collapse = ", AND ")), Phone =
paste(unique(Phone), collapse = " OR ")) %>%
ungroup() %>%
arrange(Street, desc(House)) %>%
select(colnames(dataset)) %>%
filter(!Phone %in% dnc$`Home Phone`)
问题:上述dplyr,我串联如果房子是NA (或空白,我把我的NA留空),Apt是NA(或“”),我不想。因此,使用上面的代码,我会
Name House Street Apt City Postal Phone
DUNN S, AND FERGUSON BOB GREEN ROAD DESERT Z0K2K1 9995773256
OR 9995773772
DUMAS PAUL 2030 GREEN ROAD DESERT Z0K2K1
9995773789
BLACK CARY, AND BLACK RUTH 2079 GREEN ROAD DESERT Z0K2K1
9995773779
FITSCHEN A 3989 GREEN ROAD DESERT Z0K2K1
9995773556
通过以上,请注意邓恩S,而现在弗格森BOB在一起。我不要那个。
dput(抱歉,如果没有帮助):
list(structure(list(X__1 = c(NA, NA, NA, NA, NA, NA), Name = c("DUMAS
PAUL",
"DUNN S", "FERGUSON BOB", "FITSCHEN A", "BLACK CARY", "BLACK RUTH"
), House = c("2030", NA, NA, "3989", "2079", "2079"), Street = c("GREEN
ROAD",
"GREEN ROAD", "GREEN ROAD", "GREEN ROAD", "GREEN ROAD", "GREEN ROAD"
), Apt = c(NA, NA, NA, NA, NA, NA), City = c("DESERT", "DESERT",
"DESERT", "DESERT", "DESERT", "DESERT"), Prov = c("ZK", "ZK",
"ZK", "ZK", "ZK", "ZK"), Postal = c("Z0K2K1", "Z0K2K1", "Z0K2K1",
"Z0K2K1", "Z0K2K1", "Z0K2K1"), Phone = c("999-577-3789", "999-577-3256",
"999-577-3772", "999-577-3556", "999-577-3779", "999-577-3779"
), `Last Appear Date` = c(NA, NA, NA, NA, NA, NA)), .Names = c("X__1",
"Name", "House", "Street", "Apt", "City", "Prov", "Postal", "Phone",
"Last Appear Date"), class = c("tbl_df", "tbl", "data.frame"), row.names
= c(NA,
-6L)))
感谢