2017-08-31 70 views
2

正如标题所示,我正在寻找一种方法来将简写为“字符”的短文转换为数字数据。比如我想使我的数据帧中的这些变化:将百亿/十亿个缩写变成实际数字?即。 5.12M - > 5,120,000

84.06M -> 84,060,000 
30.12B -> 30,120,000,000 
9.78B -> 9,780,000,000 
251.29M -> 251,29,000 

下面是一些我的工作数据帧的例子:

Index Market Cap Income Sales Book/sh 
ZX    -  84.06M -1.50M 359.50M 7.42 
ZTS  S&P 500  30.13B 878.00M 5.02B 3.49 
ZTR    -   -   -  -  - 
ZTO    -  9.78B 288.30M 1.47B 4.28 
ZPIN   -  1.02B 27.40M 285.20M 4.27 
ZOES   - 251.29M -0.20M 294.10M 6.79 
ZNH    -  10.92B 757.40M 17.26B 33.23 
ZF    -   -   -  -  - 
ZEN    -  2.78B -106.70M 363.60M 3.09 
ZBK    -  6.06B   - 2.46B 34.65 
ZBH  S&P 500  22.76B 712.00M 7.78B 50.94 

没有人有一些建议吗?我想在GSUB基础R ...

+0

如果这些来自另一种格式,可能会有更直接的方式。例如,如果它们是Excel工作簿中的值,那么最好使用'readxl :: read_excel(the-excel-file.xlsx)'。 – Hugh

回答

1

试试这个:

income <- c("84.06M", "30.12B", "251.29M") 

toInteger <- function(income){ 
    amt <- as.numeric(gsub("[A-Z]", "", income)) 
    multiplier <- substring(income, nchar(income)) 
    multiplier <- dplyr::case_when(multiplier == "M" ~ 1e6, 
           multiplier == "B" ~ 1e9, 
           TRUE ~ 1) # you can add on other conditions for more suffixes 
    amt*multiplier 
} 

>toInteger(income) 
[1] 8.4060e+07 3.0120e+10 2.5129e+08 
1

你可以改变你的所有列是这样的:

test = c("30.13B","84.06M","84.06B","84.06M") 
values = sapply(strsplit(test,c("B","M")),function(x) as.numeric(x)) 
amount = sapply(strsplit(test,""), function(x) x[length(x)]) 
values2 = sapply(1:length(amount),function(x) ifelse(amount[x] == "B",values[x]*1e9,values[x]*1e6)) 

只是要数据框替换列测试更改和value的数据帧名称和您正在更改的列

5

您可以试试这个

num <- c("1.23M", "15.69B", "123.999M") 
num <- gsub('B', 'e9', num) 
num <- gsub('M', 'e6', num) 
format(as.numeric(num), scientific = FALSE, big.mark = ",") 

"84,060,000" "30,120,000,000" "251,290,000"