在R中自动计算

我有大文件，我需要计算不同记录的时差。为了说明，提供MWE在R中自动计算

的数据的数据帧DF：

 st time  from to type size flg   fid  src  dst no ID 
     + 0.163944 2 1  a  40 -------  1  2.4  5.4 0 10 
     + 0.215400 2 1  a  40 -------  1  2.4  5.4 1 28 
     + 0.239528 2 1  t  40 -------  1  2.4  5.4 0 37 
     + 0.287784 2 1  t 1040 -------  1  2.4  5.4 1 62 
     + 0.287784 2 1  t 1040 -------  1  2.4  5.4 2 63 
     .......... . .  ... .. .......  .  .  .. . .. 
     # here should be some more lines with different value such as 
     - 0.487784 3 0  t 1040 -------  4  2.8  7.4 2 23 
     # the above line will be filtered out by the conditions-just ignore it 
     .......... . .  ... .. .......  .  .  .. . .. 
     r 0.188072 0 5  a 40 -------  1   2.4  5.4 0 10 
     r 0.239528 0 5  a 40 -------  1   2.4  5.4 1 28 
     r 0.263656 0 5  t 40 -------  1   2.4  5.4 0 37 
     r 0.317128 0 5  t 1040 -------  1   2.4  5.4 1 62 
     r 0.318792 0 5  t 1040 -------  1   2.4  5.4 2 63

条件1：对于每个记录前面带有 “+” 的 'ID' 是唯一的。将“src”，“dst”和“from”添加到条件中。根据这些信息，“时间”字段将作为数组的开始记录（即数组[ID] =时间）。

条件2：对于每个记录以“r”开头的'ID'将被检查。基于这些信息，所需的时间差将是：当前“时间” - 数组[ID]。

我已经创建了R代码，它的工作。但是，我使用固定的src和dst值。 src：x.y的格式，其中x总是= 2且y正在变化（即y = 0,1,2,3,4，.......）。此外，DST：ZF，其中z和f正在改变（即可能是4.3,5.2,6.100 ....）

将R代码：

src<-"2.4" # this value should be automated like 2.y. Any suggestions !!! 
dst<-"5.4" # this value should be automated like z.f 
ReqTime<-0 
timeHolder<-c() 

#start 
start<-df[df[, "st"] == "+" & 
     df[, "from"] == 2 & 
     # the src and dst should be automated 
     df[, "src"] == src &   
     df[, "dst"] == dst,] 

timeHolder[start$ID]<-start$time 

#end 
end<-df[df[, "st"] == "r" & 
      df[, "from"] == 0 & 
      df[, "src"] == src & 
      df[, "dst"] == dst,] 


if(!is.null(timeHolder[end$ID])){ 
    ReqTime<- end$time- timeHolder[end$pktID] 

} 

cat("Time from ",src,"--",dst,": ",ReqTime,"\n")

}

预期的输出：

Time from 2.4 -- 5.4 : 0.024128 0.024128 0.024128 0.029344 0.031008

或大加赞赏，如果我能得到输出如下：

Time from 2.4 -- 5.4 : mean(0.024128 0.024128 0.024128 0.029344 0.031008) which is =0.0265472

来源

2013-10-27 SimpleNEasy

如果我理解正确的，你想，你可以aggregate您的数据是什么：

#your data plus some extra 
DF <- read.table(text = 'st time  from to type size flg   fid  src  dst no ID 
    + 0.163944 2 1  a  40 -------  1  2.4  5.4 0 10 
    + 0.215400 2 1  a  40 -------  1  2.4  5.4 1 28 
    + 0.239528 2 1  t  40 -------  1  2.4  5.4 0 37 
    + 0.287784 2 1  t 1040 -------  1  2.4  5.4 1 62 
    + 0.287784 2 1  t 1040 -------  1  2.4  5.4 2 63 
    + 0.297784 2 1  t 1040 -------  1  2.5  5.7 2 65 
    + 0.307984 2 1  t 1040 -------  1  2.5  5.7 2 67 
    + 0.325784 2 1  t 1040 -------  1  2.5  5.7 2 68 
    #.......... . .  ... .. .......  .  .  .. . .. 
    # here should be some more lines with different value such as 
    #- 0.487784 3 0  t 1040 -------  4  2.8  7.4 2 23 
    # the above line will be filtered out by the conditions-just ignore it 
    #.......... . .  ... .. .......  .  .  .. . .. 
    r 0.188072 0 5  a 40 -------  1   2.4  5.4 0 10 
    r 0.239528 0 5  a 40 -------  1   2.4  5.4 1 28 
    r 0.263656 0 5  t 40 -------  1   2.4  5.4 0 37 
    r 0.317128 0 5  t 1040 -------  1   2.4  5.4 1 62 
    r 0.318792 0 5  t 1040 -------  1   2.4  5.4 2 63 
    r 0.328792 0 5  t 1040 -------  1   2.5  5.7 2 65 
    r 0.338792 0 5  t 1040 -------  1   2.5  5.7 2 67 
    r 0.348792 0 5  t 1040 -------  1   2.5  5.7 2 68', 
    header = T, stringsAsFactors = F) 

aggregate(DF$time, list(src = DF$src, dst = DF$dst, ID = DF$ID), diff) 
# src dst ID  x 
#1 2.4 5.4 10 0.024128 
#2 2.4 5.4 28 0.024128 
#3 2.4 5.4 37 0.024128 
#4 2.4 5.4 62 0.029344 
#5 2.4 5.4 63 0.031008 
#6 2.5 5.7 65 0.031008 
#7 2.5 5.7 67 0.030808 
#8 2.5 5.7 68 0.023008

此外，通过命名aggregate的回归aggDF，您可以拨打第二个电话aggregate，显示效果清晰：

aggDF <- aggregate(DF$time, list(src = DF$src, dst = DF$dst, ID = DF$ID), diff) 

aggregate(aggDF$x, list(src = aggDF$src, dst = aggDF$dst), list) 
# src dst            x 
#1 2.4 5.4 0.024128, 0.024128, 0.024128, 0.029344, 0.031008 
#2 2.5 5.7      0.031008, 0.030808, 0.023008

来源

2013-10-27 09:01:59

你的方法已接近完成。然而，在DF中DF $ st可以是（“+”，“ - ”，“r”），如何将它包含为条件？另外如果我想添加更多的条件，如DF $大小== 1040，不确定它适用于聚合，除非保持重新聚合。这就是为什么我把条件分开变量（即开始和结束）。也许聚合在未来的要求和修正中不会灵活！以前还没有用过聚合物！ – SimpleNEasy

为了检查结果，我试图在你的方法中做到这一点。我做了更多条件的第一个聚合。然后我将更多的过滤器添加到聚合数据。然后我试图将重新聚合放在列表中或计算平均值。我得到的“参数意味着不同的行数错误”。在上面的示例数据的工作（我猜是因为它是理想的每个“+”有“R”，但大文件不起作用。请参阅我的更新你的答案。 – SimpleNEasy

这就是为什么我试图避免使用聚合和使用我上面的逻辑 – SimpleNEasy

在R中自动计算

回答

相关问题