2012-10-22 75 views
3

我一直在挠挠我的头。我有两个数据帧:df乘以数据帧的列

df <- data.frame(group = 1:3, 
       age = seq(30, 50, length.out = 3), 
       income = seq(100, 500, length.out = 3), 
       assets = seq(500, 800, length.out = 3)) 

weights

weights <- data.frame(age = 5, income = 10) 

我想乘这两个数据帧只对相同的列名。我想是这样的:

colwise(function(x) {x * weights[names(x)]})(df) 

但显然没有为colwise工作不留列名在函数内部。我看了各种mapply解决方案(example),但我无法想出一个答案。

产生的data.frame应该是这样的:

structure(list(group = 1:3, age = c(150, 200, 250), income = c(1000, 
3000, 5000), assets = c(500, 650, 800)), .Names = c("group", 
"age", "income", "assets"), row.names = c(NA, -3L), class = "data.frame") 

    group age income assets 
1  1 150 1000 500 
2  2 200 3000 650 
3  3 250 5000 800 

回答

6

sweep()是你的朋友在这里,为这个特殊的例子。它依靠dfweights中的名字是正确的顺序,但可以安排。

> nams <- names(weights) 
> df[, nams] <- sweep(df[, nams], 2, unlist(weights), "*") 
> df 
    group age income assets 
1  1 150 1000 500 
2  2 200 3000 650 
3  3 250 5000 800 

如果weightsdf变量名称不以相同的顺序,你可以使他们如此:

> df2 <- data.frame(group = 1:3, 
+     age = seq(30, 50, length.out = 3), 
+     income = seq(100, 500, length.out = 3), 
+     assets = seq(500, 800, length.out = 3)) 
> nams <- c("age", "income") ## order in df2 
> weights2 <- weights[, rev(nams)] 
> weights2 ## wrong order compared to df2 
    income age 
1  10 5 
> df2[, nams] <- sweep(df2[, nams], 2, unlist(weights2[, nams]), "*") 
> df2 
    group age income assets 
1  1 150 1000 500 
2  2 200 3000 650 
3  3 250 5000 800 

换句话说,我们重新排序的所有对象,使ageincome都在正确的顺序。

+0

谢谢@Gavin。 (我在发布之前看过扫描,但根本不了解这个功能)。请您详细说明如何不依赖名称的顺序?原始DF可能有100列,但重量可能只有几个,并且可以以任何顺序。 – karlos

+0

查看我的更新。关键是要确保在代码行中每个对象的顺序都是正确的。所以在'nams'中设置你想要的顺序,然后用'nams'命令所有的对象。 –

+0

谢谢!我认为这是一个很好的解决方案,可以用于很多情况。 – karlos

3

有人可能有一个光滑的方式与plyr做到这一点,但是这可能是在基地R.最直接的方式

shared.names <- intersect(names(df), names(weights)) 
cols <- sapply(names(df), USE.NAMES=TRUE, simplify=FALSE, FUN=function(name) 
     if (name %in% shared.names) df[[name]] * weights[[name]] else df[[name]]) 
data.frame(do.call(cbind, cols)) 

# group age income assets 
# 1  1 150 1000 500 
# 2  2 200 3000 650 
# 3  3 250 5000 800 
+0

看起来我们是按照相同的总体思路思考的。我总是忘记'intersect()'。 – A5C1D2H2I1M1N2O1R2T1

3

您的数据:

df <- data.frame(group = 1:3, 
       age = seq(30, 50, length.out = 3), 
       income = seq(100, 500, length.out = 3), 
       assets = seq(500, 800, length.out = 3)) 
weights <- data.frame(age = 5, income = 10) 

逻辑:

# Basic name matching looks like this 
names(df[names(df) %in% names(weights)]) 
# [1] "age" "income" 

# Use that in `sapply()` 
sapply(names(df[names(df) %in% names(weights)]), 
     function(x) df[[x]] * weights[[x]]) 
#  age income 
# [1,] 150 1000 
# [2,] 200 3000 
# [3,] 250 5000 

实施:

# Put it all together, replacing the original data 
df[names(df) %in% names(weights)] <- sapply(names(df[names(df) %in% names(weights)]), 
              function(x) df[[x]] * weights[[x]]) 

结果:

df 
# group age income assets 
# 1  1 150 1000 500 
# 2  2 200 3000 650 
# 3  3 250 5000 800 
+0

+1,更简单 –

+0

谢谢@mrdwab。这是一个非常简单但有效的解决方案 – karlos

0

您可以为循环使用导致从(%单位:%)指数也做到这一点的。上述方法效率更高,但这是一种选择。

results <- list() 
    for (i in 1:length(which(names(df) %in% names(weights)))) { 
    idx1 <- which(names(df) %in% names(weights))[i] 
     idx2 <- which(names(weights) %in% names(df))[i] 
    results[[i]] <- dat[,idx1] * weights[idx2] 
    } 
unlist(results) 
2

这里是一个data.table溶液

library(data.table) 
DT <- data.table(df) 
W <- data.table(weights) 

使用mapply(或Map),然后是两个同时 通过参考来计算新的列,并添加。

DT <- data.table(df) 
W <- data.table(weights) 


DT[, `:=`(names(W), Map('*', DT[,names(W), with = F], W)), with = F]