我的评论转换为答案,这里是一个使用“data.table的方法“包。
library(data.table)
x <- "path/to/yourLogFile.txt"
mydt <- fread(x, header = FALSE, col.names = c("Key", "Time"))
dcast(mydt[, Time := as.numeric(sub("Time=", "", Time))][
, Ind := sequence(.N), Key], Key ~ Ind, value.var = "Time")[
, Diff := `2` - `1`][]
# Key 1 2 Diff
# 1: Key=1 146656456446 146656456448 2
# 2: Key=2 146656456447 146656456450 3
使用我的“splitstackshape”包和相同的步骤,在数据读取可能看起来像另一个类似的办法:
library(splitstackshape)
dcast(getanID(cSplit(mydt, "Time", "="), "Key"),
Key ~ Time_1 + .id, value.var = "Time_2")[
, Diff := Time_2 - Time_1, by = Key][]
# Key Time_1 Time_2 Diff
# 1: Key=1 146656456446 146656456448 2
# 2: Key=2 146656456447 146656456450 3
对于读取日志文件,我做了如下假设:
- 你知道有两列预期。
- 您的日志文件当前没有列名称(因此为
header = FALSE
)。
- 您希望数据由
|
字符分隔,fread
可以自动检测。
更新
这是不漂亮,但工程....
dcast(getanID(cSplit(mydt, names(mydt), "="), "Key_2"),
Key_2 ~ .id, fun=list(I, I), value.var = list("Field_2", "Time_2"), fill = 0)[
, c("Field_2_I_1", "Diff") := list(NULL, Time_2_I_2 - Time_2_I_1)][]
## Key_2 Field_2_I_2 Time_2_I_1 Time_2_I_2 Diff
## 1: 1 10 146656456446 146656456448 2
## 2: 2 11 146656456447 146656456450 3
的样本数据
## Just to simulate a log file like the one you describe....
## "temp" would be your actual file....
x <- c("Key=1|Time=146656456446", "Key=2|Time=146656456447",
"Key=1|Time=146656456448|field=10", "Key=2|Time=146656456450|field=11")
temp <- tempfile()
writeLines(x, temp)
mydt <- fread(temp, header = FALSE, fill = TRUE,
col.names = c("Key", "Time", "Field"))
mydt
## Key Time Field
## 1: Key=1 Time=146656456446
## 2: Key=2 Time=146656456447
## 3: Key=1 Time=146656456448 field=10
## 4: Key=2 Time=146656456450 field=11
你能更准确? 你知道可能的键值的数量吗? 如果R或Python中的任务很容易阻止你使用它们? – ivankeller
键值用于映射到相应的时间戳(总是有一对,键值将是整数)。如果使用列标题的格式(如csv中),我可以使用Key列合并。希望我很清楚 – pythonRcpp
@pythonRcpp,读取'gsub'中的数据,并在'Key ='和'Time ='中重新设置数据为一个“宽”格式,并添加一列。 – A5C1D2H2I1M1N2O1R2T1