更好的方法来计算新的变量从复杂的计算多个变量，一些神经网络

我想找到一个干净，高效的创建一个新的变量与5个现有变量的复杂计算。我的问题是，一个变量是一个因素，另外四个包含NAs。更好的方法来计算新的变量从复杂的计算多个变量，一些神经网络

我有一个数据集的具有以下结构的变量几组：

expenditure_period - 为其1 =每日，2 =每周3 = 月，4 =每年
expenditure1因子 - 整数，金额花在每天期间
expenditure2 - 整数，花费金额为每周期
expenditure3 - 整数，花费金额按月周期
expenditure4 - 整数，量花费每年周期

对于每一行/观察，只有4个整数的一个字段取决于expenditure_period的值具有的数值外，其余均为的NA。

例如：

expenditure_period expenditure1 expenditure2 expenditure3 expenditure4 
1    monthly   NA   NA    5   NA 
2    weekly   NA    5   NA   NA 
3    monthly   NA   NA    2   NA 
4    monthly   NA   NA    5   NA 
5    monthly   NA   NA   58   NA

我想创建一个包含标准每月支出一个新的变量。所以如果支出周期是每日支出1 * 30。如果每周，那么支出2 * 4。如果每月，那么支出3 * 1。如果每年，然后expenditure4/12

最好的解决方案，我能想出如下烂摊子：

data$expenditure_factor[data$expenditure_period=="daily"] <- 30 
data$expenditure_factor[data$expenditure_period=="weekly"] <- 4 
data$expenditure_factor[data$expenditure_period=="monthly"] <- 1 
data$expenditure_factor[data$expenditure_period=="yearly"] <- 1/12 
data$expenditure_month <- apply(data[,c("expenditure1", "expenditure2", 
"expenditure3", "expenditure4", "expenditure_factor")], 1, 
function(x) { sum(x[1:4], na.rm=TRUE) * x[5]})

我尝试添加在一起expenditure1,2,3,4使用+运算符，但这导致所有新辅助因3个新辅助个数增加1个。我尝试使用rm.na的sum函数创建一个临时变量，但这导致每行的总和相同。我试图从dplyr包中使用mutate，没有任何效果。

有没有更简单，更优雅的方式来做到这一点？我必须对大约12种不同的支出类别进行同样的处理。我很抱歉，如果之前已经询问过，我找不到类似的线索。如果已经有人请指导我。

我在Windows 7

来源

2016-02-16 Derek P

如果您的示例很容易重现，并且您也显示了期望/预期的结果，那将会更好。这里有一些指导：http://stackoverflow.com/a/28481250/1191259 – Frank

使用'switch'的''apply'语句'switch' –

“清洁，高效”的使用RStudio有R 3.2.3的意见，但以下会非常容易维护和理解，如果你没有看过的代码一段时间。它将数据保存在不同的表格中，一次只做一件事，并且可以在步骤之间进行检查。

# conversion table to replace bulk of mess with slightly better mess of code that is easy to inspect 
expenditure_factor <- data.frame(expenditure_period = c('daily','weekly','monthly','yearly'), 
           pfactor = c(30,4,1,1/12), 
           stringsAsFactors = F) 

# sum total expenditure (expenditurex) and remove extra columns 
data$sumexpenditure <- apply(data[ ,2:5],1,sum,na.rm = T) 
data$expenditure1 <- data$expenditure2 <- data$expenditure3 <- data$expenditure4 <- NULL 

# add factor from conversion table 
data <- merge(data,expenditure_factor,by = 'expenditure_period',all.x = T) 

# calculate final answer 
data$expenditure_month <- data$sumexpenditure * data$pfactor

或者这可能被推入单线。

假设expenditure_period是一个字符变量：

data$expenditure_period <- as.character(data$expenditure_period)

然后：

# sum total expenditure 
data$sumexpenditure <- apply(data[ ,2:5],1,sum,na.rm = T) 

# use an index 
data$expenditure_factor <- c(30,4,1,1/12)[match(data$expenditure_period,c('daily','weekly','monthly','yearly'))] 

# calculate final answer 
data$expenditure_month <- data$sumexpenditure * data$expenditure_factor

来源

2016-02-16 19:45:05 ARobertson

我喜欢这两个，但是沿着最后一个例子的行是我之后的东西。谢谢！这个看起来不太麻烦，而且可读性好几次。我喜欢第一个参考表中的因素，但是由于我必须将它加到基于每个不同开支周期的价值上，所以它本身并不能真正缩短代码。 –

好吧，这可能是一个有点不正规的做法，但是如果您重命名您的栏，它们包含的multpilier ，重新整理数据并提取乘数以用于计算新变量：

library(dplyr) 
library(tidyr) 

# New cols 
data<-rename(data, expenditure.30 = expenditure1, 
      expenditure.4 = expenditure2, 
      expenditure.1 = expenditure3, 
      `expenditure.1/2` = expenditure4) 

# Reshape and calculate new col 
data %>% gather(exp_new,exp_val,expenditure.30:`expenditure.1/2`) %>% 
     mutate(mont_exp = exp_val * as.numeric(sub('.*\\.', '', exp_new))) %>% 
     na.omit() 
# expenditure_period  exp_new exp_val mont_exp 
#7    weekly expenditure.4  5  20 
#11   monthly expenditure.1  5  5 
#13   monthly expenditure.1  2  2 
#14   monthly expenditure.1  5  5 
#15   monthly expenditure.1  58  58

来源

2016-02-16 20:13:53 mtoto

非正统但有趣！我喜欢它利用dplyr和tidyr。非常感谢您的帮助。 –

更好的方法来计算新的变量从复杂的计算多个变量，一些神经网络

回答

相关问题