分组列到计数中的R数据帧

-1

所以我总共具有这4周不同的cols在一个数据帧分组列到计数中的R数据帧

 port   ip    service  numberOfTimes 
1  22   11.11.79.100   ssh   16 
2  80   11.11.79.100   www   19 
3  111  11.13.79.110   ipw   21 
4  123  11.13.79.110   ssh   50 
5  22   64.50.80.140   cde   45 
6  80   64.50.80.140   www   16 
7  22   71.11.64.100   ssh   234 
8  80   71.11.64.100   you   33 
9  22   100.15.31.1    ssh   99 
10 41   120.15.31.12   has   19

因此，我有以下问题：

，使用R为以下是否有可能组这样它可以成为类似的东西？

后

port  ip(count of same ip) service  numberOfTimes 
22    4     ssh   399 (#1+#5+#7+#9) 
80    3     www   68 (#2+#6+#8)

等等等等的端口

来源

2016-10-13 user127886

您已经标记了'dplyr'，因此我假设您已经遇到过'group_by（）'和'summarize（）'函数。你有没有试图自己解决这个问题？你写了什么代码，到底发生了什么？ – MrFlick

嗨@MrFlick说实话。我被困住了，真的被困住了。我想尝试dt <- dt%>％group_by（port，service）％>％summarize（numberOfTimes = sum（numberOfTimes））但它没有工作。有错误无法修改分组变量。但是，再次，即使我将它们汇总在一起，链接到每一行的IP地址会发生什么情况？我真的很担心如何开始。因为每排感觉都像是依赖于另一个 – user127886

使用dplyr的休息，这是很简单的：

testData %>% 
    group_by(port, service) %>% 
    summarise(`Number of IPs` = n_distinct(ip) 
      , `Total number of times` = sum(numberOfTimes))

为样本数据包括你哪给出：

port service `Number of IPs` `Total number of times` 
    <int> <chr>   <int>     <int> 
1 22  cde    1      45 
2 22  ssh    3      349 
3 41  has    1      19 
4 80  www    2      35 
5 80  you    1      33 
6 111  ipw    1      21 
7 123  ssh    1      50

如果您遇到某种错误（在评论中暗示），则需要在人们可以帮助您之前提供实际导致该错误的数据。

来源

2016-10-13 20:15:17

哦！我完全忘记了dplyr的独特和长度的功能，非常感谢你的帮助，对此表示遗憾。 – user127886

很高兴为你工作。但是，'unique'和'length'都不是来自'dplyr'。 –

你也可以使用'n_distinct（ip）'，它应该比'length（unique（ip））'更快。 – Scarabee

分组列到计数中的R数据帧

回答

相关问题