2016-01-27 35 views
3

我有两个数据表,我试图合并。一个是公司市场价值随时间变化的数据,另一个是随时间变化的公司股利历史数据。我试图找出每家公司每个季度已经支付了多少钱,并将价值随时间推移到市场价值数据的旁边。如何做一个data.table滚动连接?

library(magrittr) 
library(data.table) 
library(zoo) 
library(lubridate) 

set.seed(1337) 
# data table of company market values 
companies <- 
    data.table(companyID = 1:10, 
       Sedol = rep(c("91772E", "7A662B"), each = 5), 
       Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1), 
       MktCap = c(100 + cumsum(rnorm(5,5)), 
          50 + cumsum(rnorm(5,1,5)))) %>% 
    setkey(Sedol, Date) 

# data table of dividends 
dividends <- 
    data.table(DivID = 1:7, 
       Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)), 
       Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29', 
           '2005-10-01', '2005-06-29', '2005-06-30', 
           '2006-04-17')), 
       DivAmnt = rnorm(7, .8, .3)) %>% 
    setkey(Sedol, Date) 

我相信这是一个情况下,你可以使用一个data.table滚动加盟,是这样的:

dividends[companies, roll = "nearest"] 

,试图得到一个数据集,看起来像

 DivID Sedol  Date DivAmnt companyID MktCap 
    1: NA 7A662B  <NA>  NA   6 61.21061 
    2:  5 7A662B 2005-06-29 0.7772631   7 66.92951 
    3:  6 7A662B 2005-06-30 1.1815343   7 66.92951 
    4: NA 7A662B  <NA>  NA   8 78.33914 
    5: NA 7A662B  <NA>  NA   9 88.92473 
    6: NA 7A662B  <NA>  NA  10 87.85067 
    7:  2 91772E 2005-01-13 0.2964291   1 105.19249 
    8:  3 91772E 2005-01-29 0.8472649   1 105.19249 
    9: NA 91772E  <NA>  NA   2 108.74579 
    10:  4 91772E 2005-10-01 1.2467408   3 113.42261 
    11: NA 91772E  <NA>  NA   4 120.04491 
    12: NA 91772E  <NA>  NA   5 124.35588 

(请注意,我已将公司市场价值的股息与确切的季度相匹配)

但我不完全是s如何执行它。如果roll是一个值(你能通过日期吗?一个数字是否可以量化前进的日子吗??的数量?)并且改变rollends似乎并不是让我得到我想要的。

最后,我最终将股利日期映射到季末,然后加入。一个好的解决方案,但是如果我最终需要知道如何执行滚动连接,那么这个解决方案就没有用处在你的回答中,你能否描述一种情况:滚动连接是唯一的解决方案,并帮助我理解如何执行它们?

+2

你能描述一下你想要做什么吗? – mtoto

+0

不知怎的,你的代码不会给出正确的data.tables;可以提供'公司'的dput()而不是? – Jaap

+0

我忘了放'library(lubridate)'声明。感谢您的发现。 – jks612

回答

4

而是滚动的加入,您可能需要使用重叠的data.tablefoverlaps功能加入:

# create an interval in the 'companies' datatable 
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))] 
# create a second date in the 'dividends' datatable 
dividends[, Date2 := divDate] 

# set the keys for the two datatable 
setkey(companies, Sedol, start, end) 
setkey(dividends, Sedol, dDate, Date2) 

# create a vector of columnnames which can be removed afterwards 
deletecols <- c("Date2","start","end") 

# perform the overlap join and remove the helper columns 
res <- foverlaps(companies, dividends)[, (deletecols) := NULL] 

结果:

> res 
    Sedol DivID divDate DivAmnt companyID compDate MktCap 
1: 7A662B NA  <NA>  NA   6 2005-03-31 61.21061 
2: 7A662B  5 2005-06-29 0.7772631   7 2005-06-30 66.92951 
3: 7A662B  6 2005-06-30 1.1815343   7 2005-06-30 66.92951 
4: 7A662B NA  <NA>  NA   8 2005-09-30 78.33914 
5: 7A662B NA  <NA>  NA   9 2005-12-31 88.92473 
6: 7A662B NA  <NA>  NA  10 2006-03-31 87.85067 
7: 91772E  2 2005-01-13 0.2964291   1 2005-03-31 105.19249 
8: 91772E  3 2005-01-29 0.8472649   1 2005-03-31 105.19249 
9: 91772E NA  <NA>  NA   2 2005-06-30 108.74579 
10: 91772E  4 2005-10-01 1.2467408   3 2005-09-30 113.42261 
11: 91772E NA  <NA>  NA   4 2005-12-31 120.04491 
12: 91772E NA  <NA>  NA   5 2006-03-31 124.35588 

使用数据(与问题中相同,但没有创建密钥):

set.seed(1337) 
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5), 
         compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1), 
         MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5)))) 
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)), 
         divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')), 
         DivAmnt = rnorm(7, .8, .3)) 
+0

何时滚动连接更合适?文档似乎说这些事情是为什么创建了滚动连接。 – jks612

+0

@ jks612将再次考虑这一点。我记得滚动连接并没有给出预期的结果,但会再次看到它。希望这周末我能参加。 – Jaap

+0

好主意,谢谢! – msp