2017-08-09 29 views
1

我试图从数据框gss中提取的矢量degree_abrev中进行一些字符串的自定义缩写。R中的字符串向量的自定义缩写

这是我能拿出...但我想看看是否有人有一个“漂亮”的方式...

degree_abrev <- gsub("Lt High School", "LtHS", gss$degree) 
degree_abrev <- gsub("High School", "HS", degree_abrev) 
degree_abrev <- gsub("Junior College", "JC", degree_abrev) 
degree_abrev <- gsub("Bachelor", "B", degree_abrev) 
degree_abrev <- gsub("Graduate", "G", degree_abrev) 
+1

我会把这些放在一个表中,并在它们上进行匹配/合并而不是正则化(假设这是可能的)。 – Frank

回答

1

“plyr”包有“mapvalues”功能做这个。我相信肯定还有其他方法可以做到这一点。

> degree_abbrev <- c("Lt High School", "High School", "Junior College", 
"Bachelor", "Graduate") 

> degree_abbrev 
[1] "Lt High School" "High School" "Junior College" "Bachelor"  
"Graduate"  

> degree_abbrev <- mapvalues(degree_abbrev, from = c("Lt High School", "High 
School", "Junior College", "Bachelor", "Graduate"), to = c("LtHS", "HS", 
"JC", "B", "G")) 

> degree_abbrev 
[1] "LtHS" "HS" "JC" "B" "G" 
+1

它的继任者'dplyr'有'recode' –

+0

这很好,因为我实际上使用了recode – jesusgarciab

0

我不知道这是否更漂亮,但我更喜欢使用sapply。

degree_abrev <- c("Lt High School", "High School", "Junior College", "Bachelor", "Graduate") 

sapply(strsplit(degree_abrev, " "), function(x){paste(substring(x, 1, 1), collapse = "")}) 
[1] "LHS" "HS" "JC" "B" "G"