2017-01-16 41 views
13

我有两张表,我需要做sumif横跨。表1包含时间段,即年终和年底(即4,8,12等)。表2包含一年中的季度交易3,6,7如何sumif跨两个表?

我需要表3来总结一年中的所有交易,以便我得到年末的累计头寸。

下面是一些示例代码来解释数据的模样,什么输出应该是这样的:

library(data.table) 

x1 <- data.table("Name" = "LOB1", "Year" = 2000, 
       "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36)) 
x2 <- data.table("Name" = "LOB1", "Year" = 2000, 
       "Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24), 
       "Amount" = c(10000, 15000, -2500, 3500, -6500, 25000, 
           11000, 9000, 7500)) 
x3 <- data.table("Name" = "LOB1", "Year" = 2000, 
       "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36), 
       "Amount" = c(10000, 22500, 19500, 55500, 64500, 72000, 
           72000, 72000, 72000)) 

我试过mergesummarisefoverlaps但不能完全弄清楚。

回答

11

不错的问题。基本上你要做的是加入Name,YearQuarter <= Quarter,同时将所有匹配的Amount值相加。这可以使用新的非Equi联接(其在最新的稳定版本的data.table v-1.10.0中引入)和foverlaps(尽管后者可能是次优的)

Non-相等联接:

x2[x1, # for each value in `x1` find all the matching values in `x2` 
    .(Amount = sum(Amount)), # Sum all the matching values in `Amount` 
    on = .(Name, Year, Quarter <= Quarter), # join conditions 
    by = .EACHI] # Do the summing per each match in `i` 
# Name Year Quarter Amount 
# 1: LOB1 2000  4 10000 
# 2: LOB1 2000  8 22500 
# 3: LOB1 2000  12 19500 
# 4: LOB1 2000  16 55500 
# 5: LOB1 2000  20 64500 
# 6: LOB1 2000  24 72000 
# 7: LOB1 2000  28 72000 
# 8: LOB1 2000  32 72000 
# 9: LOB1 2000  36 72000 

作为一个侧面说明,你可以很容易地发生在x1添加Amount(由@Frank建议):

x1[, Amount := 
    x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1 
] 

如果您在该表中有三个以上的连接列,这可能会很方便。


foverlaps:

你提到foverlaps,所以从理论上讲,你可以使用此功能也达到同样的。虽然我担心你会很容易失去记忆。使用foverlaps,你需要这么多的创造了巨大的表,其中在x2每个值在x1加入多次每个值并存储在内存中的一切

x1[, Start := 0] # Make sure that we always join starting from Q0 
x2[, Start := Quarter] # In x2 we want to join all possible rows each time 
setkey(x2, Name, Year, Start, Quarter) # set keys 
## Make a huge cartesian join by overlaps and then aggregate 
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)] 
# Name Year Quarter Amount 
# 1: LOB1 2000  4 10000 
# 2: LOB1 2000  8 22500 
# 3: LOB1 2000  12 19500 
# 4: LOB1 2000  16 55500 
# 5: LOB1 2000  20 64500 
# 6: LOB1 2000  24 72000 
# 7: LOB1 2000  28 72000 
# 8: LOB1 2000  32 72000 
# 9: LOB1 2000  36 72000 
+0

谢谢 - 我刚刚得到这个工作。非常感谢!它看起来像我的两个表都需要有相同的列。如果x2有一个额外的列,我不想包含在结果表x3中,代码是否会相同? – kodfather

+0

您可以在'on'参数中指定您希望从两个表中选择哪个列名称。例如'.on(column1 = column2,column3 = column4)'等。提示的LHS是来自'x1'的列,而方程的RHS是来自'x2'的列。 –