在data.frame创建通过填充组重复的行

这里我的例子data.frame：在data.frame创建通过填充组重复的行

df = read.table(text = 'ID Day Count Count_group 
       18 1933 6 15 
       33 1933 6 15 
       37 1933 6 15 
       18 1933 6 15 
       16 1933 6 15 
       11 1933 6 15 
       111 1932 5 9 
       34 1932 5 9 
       60 1932 5 9 
       88 1932 5 9 
       18 1932 5 9 
       33 1931 3 4 
       13 1931 3 4 
       56 1931 3 4 
       23 1930 1 1 
       6 1800 6 12 
       37 1800 6 12 
       98 1800 6 12 
       52 1800 6 12 
       18 1800 6 12 
       76 1800 6 12 
       55 1799 4 6 
       6 1799 4 6 
       52 1799 4 6 
       133 1799 4 6 
       112 1798 2 2 
       677 1798 2 2 
       778 888  4 8 
       111 888  4 8 
       88 888  4 8 
       10 888  4 8 
       37 887  2 4 
       26 887  2 4 
       8 886  1 2 
       56 885  1 1 
       22 120  2 6 
       34 120  2 6 
       88 119  1 6 
       99 118  2 5 
       12 118  2 5 
       90 117  1 3 
       22 115  2 2 
       99 115  2 2', header = TRUE)

Count列显示的ID观测一个Day内的数目; Count_group显示Day及其前4天内的ID观测值的数量。

我需要扩大df才能拥有每个Count_group集中的所有日子。

预期输出：

ID Day Count Count_group 
18 1933 6 15 
33 1933 6 15 
37 1933 6 15 
18 1933 6 15 
16 1933 6 15 
11 1933 6 15 
111 1932 5 15 
34 1932 5 15 
60 1932 5 15 
88 1932 5 15 
18 1932 5 15 
33 1931 3 15 
13 1931 3 15 
56 1931 3 15 
23 1930 1 15 
6 1800 6 12 
37 1800 6 12 
98 1800 6 12 
52 1800 6 12 
18 1800 6 12 
76 1800 6 12 
55 1799 4 12 
6 1799 4 12 
52 1799 4 12 
133 1799 4 12 
112 1798 2 12 
677 1798 2 12 
111 1932 5 9 
34 1932 5 9 
60 1932 5 9 
88 1932 5 9 
18 1932 5 9 
33 1931 3 9 
13 1931 3 9 
56 1931 3 9 
23 1930 1 9 
778 888 4 8 
111 888 4 8 
88 888 4 8 
10 888 4 8 
37 887 2 8 
26 887 2 8 
8 886 1 8 
56 885 1 8 
55 1799 4 6 
6 1799 4 6 
52 1799 4 6 
133 1799 4 6 
112 1798 2 6 
677 1798 2 6 
22 120 2 6 
34 120 2 6 
88 119 1 6 
88 119 1 6 
99 118 2 6 
12 118 2 6 
99 118 2 6 
12 118 2 6 
90 117 1 6 
90 117 1 6 
22 115 2 6 
99 115 2 6 
99 118 2 5 
12 118 2 5 
90 117 1 5 
22 115 2 5 
99 115 2 5 
33 1931 3 4 
13 1931 3 4 
56 1931 3 4 
23 1930 1 4 
37 887 2 4 
26 887 2 4 
8 886 1 4 
56 885 1 4 
90 117 1 3 
22 115 2 3 
99 115 2 3 
112 1798 2 2 
677 1798 2 2 
8 886 1 2 
56 885 1 2 
22 115 2 2 
99 115 2 2 
23 1930 1 1 
56 885 1 1

输出的说明：

1）1933日就这一精确天得到了6点的ID（计数COL）和总共15点的ID从1933年日到1929年日（ Count_group col）。值15来自6（1933年）+5（1932）+3（1931）+ 1（1930）+0（1929）。因此，在输出中，我添加了Count_group = 15集内的所有剩余天数。

2）下一天按降序排列是1932年。在这个精确的日期有5个ID，从1932年到1928年的总共有9个ID。值9从5（1932）+3（1931）+1（ 1930）+ 0（1929）+ 0（1928）。在输出（第28行）中，您将看到第1932天完成（5天）剧集，共有9行。

3）接着日是1931..etc等。

输出data.frame由Count_group和日，既降低= TRUE排名。

我想创建一个代码，不仅适用于5天窗口（如上所述），而且适用于n天的任何时间窗口。

你有什么建议吗？

感谢

来源

2017-06-02 aaaaa

ok..could你试试？ – aaaaa

我不完全理解你是如何从数据到预期的输出，但你可能会使用['tidyr :: complete（）']（http://tidyr.tidyverse.org/reference/complete.html）。也许看到这个[问题]（https://stackoverflow.com/questions/44271398/for-loops-including-rows-in-a-dataframe-by-the-missing-values-of-factor-levels/44271839#44271839 ）或[this one]（https://stackoverflow.com/questions/10438969/fastest-way-to-add-rows-for-missing-values-in-a-data-frame/44272077#44272077）。 – austensen

我有点困惑。我们如何为您创建新的行。什么规则简单明了？写下你无法弄清楚如何编码的过程。我们如何计算每个列中的新值以帮助您？请将回复张贴为您的问题的编辑。 –

尝试了这一点，并告诉我，如果这是你在想：

# First I split the dataframe by each day using split() 
duplicates <- lapply(split(df, df$Day), function(x){ 
    if(nrow(x) != x[1,"Count_group"]) { # check if # of rows != the number you want 
    x[rep(1:nrow(x), length.out = x[1,"Count_group"]),] # repeat them until you get it 
    } else { 
    x 
    } 
}) 

df2 <- do.call("rbind.data.frame", duplicates) # turn the list back into a dataframe 
df3 <- df2[order(df2[,"Count_group"], df2[,"Day"], decreasing = T), ] # orderby Day & count 
rownames(df3) <- NULL # names back to 1:X instead of the generated ones 
df3 # the result

来源

2017-06-02 17:53:58

在data.frame创建通过填充组重复的行

回答

相关问题