我正在寻找一种方法来连接(或者合并)R中包含具有指定错误范围的测量值的R中的两个或多个数据帧。这意味着“by”列中的值将是nnn.nnnn +/- 0.000n。容错限于3 e-6倍的值。按测量值连接数据帧并显示错误范围
这是迄今为止我的最佳尝试。
newDF < - left_join(P0511_480k,P0511_SF00V,通过= C(P0511_480k $ MZ ==(P0511_SF00V $ MZ - 0.000003(P0511_480k $ MZ)):(P0511_SF00V $ MZ + 0.000003(P0511_480k $ MZ))))
在这个表达式中,我有两个数据帧(P0511_480k和P0511_SF00V),我想通过名为“mz”的列合并它们。值的可接受范围是正数或负数“m.z”乘以0.000003。例如,P0511_480k_subset $ m.z = 187.06162应该与P0511_SF00V_subset $ m.z = 187.06155相匹配。
> dput(head(P0511_480k_subset, 10))
structure(list(m.z = c(187.06162, 203.05652, 215.05668, 217.07224,
279.05499), Intensity = c(319420.8, 288068.9, 229953, 210107.8,
180054), Relative = c(100, 90.18, 71.99, 65.78, 56.37), Resolution = c(394956.59,
415308.31, 387924.91, 437318.31, 410670.91), Baseline = c(2.1,
1.43, 1.69, 1.73, 3.04), Noise = c(28.03, 27.17, 27.52, 27.58,
29.37)), .Names = c("m.z", "Intensity", "Relative", "Resolution",
"Baseline", "Noise"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-5L))
和
> dput(head(P0511_SF00V_subset, 10))
structure(list(m.z = c(187.06155, 203.05641, 215.05654, 217.0721
), Intensity = c(1021342.8, 801347.1, 662928.1, 523234.2), Relative = c(100,
78.46, 64.91, 51.23), Resolution = c(314271.88, 298427.41, 289803.97,
288163.63), Baseline = c(6.89, 10.47, 9.13, 8.89), Noise = c(40.94,
45.98, 44.3, 44.01)), .Names = c("m.z", "Intensity", "Relative",
"Resolution", "Baseline", "Noise"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -4L))
我感谢您的建议!我已经尽可能广泛地搜索了帮助文档,但我一直无法找到接近我所需的示例。
非常感谢!
请使用'dput()'或'dput(head(df,20))'提供您的数据(或其子集)。另外,当你进行乘法运算时(即使数字在括号之前),你需要指定'*' – etienne
查看[* fuzzyjoin * package](https://github.com/dgrtwo/fuzzyjoin),它是dplyr的加入操作的变体。 – aosmith
我认为你需要像'data.table :: foverlaps()',提供数据和预期的输出。 – zx8754