假设我有以下数据集,其中列的结构如下。从字符串中提取元素
df1 = data.frame(Date=c(rnorm(5)),
"United States) New York (NY" = c(rnorm(5)),
"United States) Chicago (Illinois" = c(rnorm(5)),
"United States) Denver (Colorado" = c(rnorm(5)),
"United States) Seattle (Washington" = c(rnorm(5)),
"United States) Minneapolis (Minnesota" = c(rnorm(5)), check.names=FALSE)
df1
df2 = data.frame(Date=c(rnorm(5)),
"New York (New York, United States)" = c(rnorm(5)),
"Phoenix (Arizona, United States)" = c(rnorm(5)),
"Chicago (Illinois, United States)" = c(rnorm(5)),
"Los Angeles (California, United States)" = c(rnorm(5)), check.names=FALSE)
df2
正如您所看到的,每列仅用于表示城市,但列名的结构不可管理。我想知道是否有人能帮我弄清楚如何从列名字符串中提取城市名称。
我可以为每个城市准备一本字典,并进行字符串匹配,但我对此一无所知。我也认为有一种方法可以用str_split来做到这一点,但我还没有弄明白。
sapply(str_split(names(df1),")"), 2)
当然,我敢肯定还有一个gsub解决方案,但在正则表达式方面,我有点无能为力。
最终,我只想将实际的城市名称作为列名称。
New York, Chicago, Denver, Seattle, Minneapolis
您可能希望为这些示例数据框调用添加'check.names = FALSE'。 – 2014-10-27 23:36:43
是的,很好的电话。 – ATMA 2014-10-27 23:42:04