[R串清洁

我与一些字符串，这是非常混乱的工作如下图所示[R串清洁

Value 
------------------- 
25 
32.12 . (05- 
33.90 , 
46.70 , 
() 26.60 
27.2 
23.24 . (12- 
36.52 , 
27.1814404432133 [ 
29.73 . (22- 
31.8058003525076 [ 
35.40 , 
38.44 . 
46.14 , 
29.26 [ 
25.44 .

我不知道如何清洁高效，使得它看上去是这样。

Value 
------------------- 
25 
32.12 
33.90 
46.70 
26.60 
27.2 
23.24 
36.52 
27.1814404432133 
29.73 
31.8058003525076 
35.40 
38.44 
46.14 
29.26 
25.44

我试着用子功能，sub(" .*", '', Value)捕捉空间，但没有工作之前一切，所以寻找如何清理这个字符串一些建议或提示。

Value <- c(" 25 \n", " 32.12 . (05-", "33.90 ,\n", "46.70 ,\n", "() 26.60 ", 
      " 27.2 ", " 23.24 . (12-", "36.52 ,\n", " 27.1814404432133\n\n[", 
      " 29.73 . (22-", " 31.8058003525076\n\n[", "35.40 ,\n", " 38.44 .\n", 
      "46.14 ,\n", " 29.26\n\n[", " 25.44 .\n") 
df <- data.frame(Value)

来源

2017-08-16 Jill Sellum

您可以提取使用

Value <- c(" 25 \n", " 32.12 . (05-", "33.90 ,\n", "46.70 ,\n", "() 26.60 ", 
      " 27.2 ", " 23.24 . (12-", "36.52 ,\n", " 27.1814404432133\n\n[", 
      " 29.73 . (22-", " 31.8058003525076\n\n[", "35.40 ,\n", " 38.44 .\n", 
      "46.14 ,\n", " 29.26\n\n[", " 25.44 .\n") 
df <- data.frame(Value) 
df$Value <- sub(".*?(\\d[0-9.]*).*", "\\1", df$Value)

第一号见R demo online

详细

.*? - 任何0+字符，尽可能少
(\\d[0-9.]*) - 第1组捕获的任何数字（\\d），然后0+数字或符号.
.* - 任何0+字符到字符串的末尾。

的sub功能执行与\1反向引用持有价值单个替换捕获到组1

如果你想确保你只能提取数字（S）+（. +数字（S））*模式，您可以使用

df$Value <- sub(".*?(\\d+(?:\\.\\d+)?).*", "\\1", df$Value)

见this R demo

来源

2017-08-16 05:06:51

啊，我试图子（ “*？（\\ [0-9] *）。*”， “\\ 1”，DF $ Value）选项，但是现在我错过了一些参数，我知道缺少了什么。谢谢。 –

你可以试试这个：

library("stringr") 

str_extract(df$Value, "(\\d|\\.)+")

来源

2017-08-16 05:09:35

感谢乔希这也解决了这个问题。 –

我们可以使用regmatches/regexpr从base R

as.numeric(regmatches(df$Value, regexpr("[0-9][0-9.]*", df$Value)))

来源

2017-08-16 06:13:12 akrun

回答

相关问题