更快％

的fastmatch包（在循环EG）实现的match重复比赛更快的版本：更快％

set.seed(1) 
library(fastmatch) 
table <- 1L:100000L 
x <- sample(table, 10000, replace=TRUE) 
system.time(for(i in 1:100) a <- match(x, table)) 
system.time(for(i in 1:100) b <- fmatch(x, table)) 
identical(a, b)

是否有类似的实现为%in%我可以用它来加快重复查找？

来源

2015-10-04 Zach

看的%in%定义：

R> `%in%` 
function (x, table) 
match(x, table, nomatch = 0L) > 0L 
<bytecode: 0x1fab7a8> 
<environment: namespace:base>

可以很容易地编写自己的%fin%功能：

`%fin%` <- function(x, table) { 
    stopifnot(require(fastmatch)) 
    fmatch(x, table, nomatch = 0L) > 0L 
} 
system.time(for(i in 1:100) a <- x %in% table) 
# user system elapsed 
# 1.780 0.000 1.782 
system.time(for(i in 1:100) b <- x %fin% table) 
# user system elapsed 
# 0.052 0.000 0.054 
identical(a, b) 
# [1] TRUE

来源

2015-10-04 15:14:02

但fastmatch如果你对阵NA这是行不通的，基础的比赛一样。 – skan

它在哪里？是“https://github.com/s-u/fastmatch”正确的链接？似乎很久以前不会更新。 – skan

我一直在尝试％fin％和fmatch与lapply匹配大data.frame或data.table的每一列，并且无法注意到速度上的很大差异。 – skan

比赛几乎总是更好地把两个向量dataframes和合并完成（见来自dplyr的各种连接）

例如，像这样的东西会给你所有你需要的信息：

library(dplyr) 

data = data_frame(data.ID = 1L:100000L, 
        data.extra = 1:2) 

sample = 
    data %>% 
    sample_n(10000, replace=TRUE) %>% 
    mutate(sample.ID = 1:n(), 
     sample.extra = 3:4) 

# join table not strictly necessary in this case 
# but necessary in many-to-many matches 
data__sample = inner_join(data, sample) 

#check whether a data.ID made it into sample 
data__sample %>% filter(data.ID == 1)

或left_join，right_join，FULL_JOIN，semi_join，anti_join，根据什么信息是最有用的，你

来源

2015-10-04 16:20:18 bramtayl

你介意解释一下吗（最好用一个包含的例子）？在这一刻，你的“答案”更多的是一个评论，而不是一个真实的答案。 – Jaap

查看修改后的版本 – bramtayl

回答

相关问题