2017-10-12 37 views
0

我正在处理一个数据集,其中一列(Place)由一个位置语句组成。从右到左在R中按分隔符分割列

librabry(tidyverse) 

example <- tibble(Datum = c("October 1st 2017", 
          "October 2st 2017", 
          "October 3rd 2017"), 
      Place = c("Tabiyyah Jazeera village, 20km south east of Deir Ezzor, Deir Ezzor Governorate, Syria", 
         "Abu Kamal, Deir Ezzor Governorate, Syria", 
         "شارع القطار al Qitar [train] street, al-Tawassiya area, north of Raqqah city centre, Raqqah governorate, Syria")) 

我想所以我更喜欢与tidyverse package一个解决Place列由逗号分隔符分割。由于Place的值有不同的长度,我想从右到左开始。因此国家Syria是此数据框最后一列的值。

噢,对于RegEx代码的奖金,我会删除阿拉伯字符吗?

在此先感谢。

编辑:发现我的答案是: 对于消除阿拉伯字符(感谢@ g5w):

gsub("[\u0600-\u06FF]", "", airstrikes_okt_clean$Plek) 

和分裂的tidyr方式在列:

airstrikes_okt_clean <- separate(example, 
          Place, 
          into = c("detail", 
             "detail2", 
             "City_or_village", 
             "District", 
             "Country"), 
          sep = ",", 
          fill = "left") 

回答

1

刚刚拆分字符串在逗号和相反它。

lapply(strsplit(Place, ","), rev) 
[[1]] 
[1] " Syria"       " Deir Ezzor Governorate"  
[3] " 20km south east of Deir Ezzor" "Tabiyyah Jazeera village"  

[[2]] 
[1] " Syria"     " Deir Ezzor Governorate" 
[3] "Abu Kamal"    

[[3]] 
[1] " Syria"        " Raqqah governorate"     
[3] " north of Raqqah city centre"  " al-Tawassiya area"     
[5] "شارع القطار al Qitar [train] street" 

要分裂之前摆脱阿拉伯字符,尽量

gsub("[\u0600-\u06FF]", "", Place) 
[1] "Tabiyyah Jazeera village, 20km south east of Deir Ezzor, Deir Ezzor Governorate, Syria"    
[2] "Abu Kamal, Deir Ezzor Governorate, Syria"                
[3] " al Qitar [train] street, al-Tawassiya area, north of Raqqah city centre, Raqqah governorate, Syria" 
+0

是否还有一个解决方案没有'lapply()'但tidyr?也许用'separator'函数? – Tdebeus

+0

@Tdebeus可能有,但我不是一个tidyr家伙。 – G5W

0

这里是一个一行。

sapply(strsplit(example$Place, ","), function(x) trimws(x[length(x)])) 

将最后一个逗号后会返回一个字符串,无论是Syria或任何其他。