这最终导致了很多步骤。你可以做得更少,但这是我做到的。我还假设yoru数据是在一个数据框中以每行一个地址开始。
dat = data.frame(Addresses = c("1626 Aviation Way, Albuquerque, NM 30906, USA",
"1626 Aviation Way, Augusta, GA 30906, USA",
"325 Main St, Stratford, CT 06615, USA",
"4205 Bessie Coleman Blvd, Tampa, FL 33607, USA"), stringsAsFactors = FALSE)
> dat
Addresses
1 1626 Aviation Way, Albuquerque, NM 30906, USA
2 1626 Aviation Way, Augusta, GA 30906, USA
3 325 Main St, Stratford, CT 06615, USA
4 4205 Bessie Coleman Blvd, Tampa, FL 33607, USA
现在,我们需要分割逗号来启动,然后将状态和zip分开。我也将通过分割逗号来删除多余的空格。
dat2 = sapply(dat$Addresses, strsplit, ",")
dat2 = lapply(dat2, trimws)
> dat2
$`1626 Aviation Way, Albuquerque, NM 30906, USA`
[1] "1626 Aviation Way" "Albuquerque" "NM 30906" "USA"
$`1626 Aviation Way, Augusta, GA 30906, USA`
[1] "1626 Aviation Way" "Augusta" "GA 30906" "USA"
$`325 Main St, Stratford, CT 06615, USA`
[1] "325 Main St" "Stratford" "CT 06615" "USA"
$`4205 Bessie Coleman Blvd, Tampa, FL 33607, USA`
[1] "4205 Bessie Coleman Blvd" "Tampa" "FL 33607" "USA"
现在,我们需要将其重新置回数据框。
dat2 = data.frame(matrix(unlist(dat2), ncol = 4, byrow = TRUE), stringsAsFactors = FALSE)
> dat2
X1 X2 X3 X4
1 1626 Aviation Way Albuquerque NM 30906 USA
2 1626 Aviation Way Augusta GA 30906 USA
3 325 Main St Stratford CT 06615 USA
4 4205 Bessie Coleman Blvd Tampa FL 33607 USA
接下来,我们可以将x3分成状态和zip,然后删除该列。
dat2$State = sapply(dat2$X3, function(x) strsplit(x, " ")[[1]][1])
dat2$Zip = sapply(dat2$X3, function(x) strsplit(x, " ")[[1]][2])
dat2 = dat2[, -3]
> dat2
X1 X2 X4 State Zip
1 1626 Aviation Way Albuquerque USA NM 30906
2 1626 Aviation Way Augusta USA GA 30906
3 325 Main St Stratford USA CT 06615
4 4205 Bessie Coleman Blvd Tampa USA FL 33607
最后,我们可以设置列名称,我们就完成了。
colnames(dat2) = c("Street", "City", "Country", "State", "Zip")
> dat2
Street City Country State Zip
1 1626 Aviation Way Albuquerque USA NM 30906
2 1626 Aviation Way Augusta USA GA 30906
3 325 Main St Stratford USA CT 06615
4 4205 Bessie Coleman Blvd Tampa USA FL 33607
查看'strsplit'或'regexpr'。 – ekstroem
或者如果您使用的是数据框,则可以使用'tidyr'中的'separate()'函数。 –
我试着做这个<-strsplit($ Adress,“,”)。我没有得到正确的答案。以下是我尝试在数据框中写入时发生的错误:错误(函数(...,row.names = NULL,check.rows = FALSE,check.names = TRUE,: 参数意味着行数不同:4,5 – Kaushik