下面是另一个版本,这在我的基准采用2 + 2的输入尺寸,90 + 90,900 + 900 + 90000 99000 300000 + 300000是最快为止。
(defn outer-join [k xs ys]
(let [gxs (group-by #(get % k) xs)
gys (group-by #(get % k) ys)
kvs (concat (keys gxs) (keys gys))]
(persistent!
(reduce (fn [out k]
(let [l (first (get gxs k))
r (first (get gys k))]
(assoc! out k [l r])))
(transient {})
kvs))))
(我尝试在distinct
包装的关键序列,但它竟然导致基准放缓累及小到中等,大投入这是有道理的:我们需要走两个关键seqs反正和工作,我们每个键做量小,所以它可能是更多的工作,以避免它)
这里是一个全面的检查和绕圈基准极少数(与amalloy的版本改名为outer-join*
):
(let [xs [{:id 1 :val 10} {:id 2 :val 20}]
ys [{:id 1 :val 12} {:id 3 :val 30}]]
(assert (= (outer-join :id xs ys)
(outer-join* :id xs ys)
(outer-join-maps xs ys :id)))
(c/bench (outer-join :id xs ys))
(c/bench (outer-join* :id xs ys))
(c/bench (outer-join-maps xs ys :id)))
WARNING: Final GC required 3.296446000194027 % of runtime
Evaluation count : 17099160 in 60 samples of 284986 calls.
Execution time mean : 3.589256 µs
Execution time std-deviation : 34.976485 ns
Execution time lower quantile : 3.544196 µs (2.5%)
Execution time upper quantile : 3.666515 µs (97.5%)
Overhead used : 2.295807 ns
Evaluation count : 6596160 in 60 samples of 109936 calls.
Execution time mean : 9.107578 µs
Execution time std-deviation : 82.176826 ns
Execution time lower quantile : 8.993900 µs (2.5%)
Execution time upper quantile : 9.295188 µs (97.5%)
Overhead used : 2.295807 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 9298740 in 60 samples of 154979 calls.
Execution time mean : 6.592289 µs
Execution time std-deviation : 63.929382 ns
Execution time lower quantile : 6.506403 µs (2.5%)
Execution time upper quantile : 6.749262 µs (97.5%)
Overhead used : 2.295807 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe 4 (6.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
(let [xs (map (fn [id] {:id id :val (* 10 id)}) (range 90))
ys (map (fn [id] {:id id :val (* 20 id)}) (range 10 100))]
(assert (= (outer-join :id xs ys)
(outer-join* :id xs ys)
(outer-join-maps xs ys :id)))
(c/bench (outer-join :id xs ys))
(c/bench (outer-join* :id xs ys))
(c/bench (outer-join-maps xs ys :id)))
Evaluation count : 413760 in 60 samples of 6896 calls.
Execution time mean : 147.182107 µs
Execution time std-deviation : 1.282179 µs
Execution time lower quantile : 145.103445 µs (2.5%)
Execution time upper quantile : 149.658348 µs (97.5%)
Overhead used : 2.295807 ns
Evaluation count : 256920 in 60 samples of 4282 calls.
Execution time mean : 238.166905 µs
Execution time std-deviation : 1.987980 µs
Execution time lower quantile : 235.211277 µs (2.5%)
Execution time upper quantile : 242.255072 µs (97.5%)
Overhead used : 2.295807 ns
Evaluation count : 362760 in 60 samples of 6046 calls.
Execution time mean : 167.301109 µs
Execution time std-deviation : 1.616075 µs
Execution time lower quantile : 164.534670 µs (2.5%)
Execution time upper quantile : 170.757257 µs (97.5%)
Overhead used : 2.295807 ns
(let [xs (map (fn [id] {:id id :val (* 10 id)}) (range 900))
ys (map (fn [id] {:id id :val (* 20 id)}) (range 100 1000))]
(assert (= (outer-join :id xs ys)
(outer-join* :id xs ys)
(outer-join-maps xs ys :id)))
(c/bench (outer-join :id xs ys))
(c/bench (outer-join* :id xs ys))
(c/bench (outer-join-maps xs ys :id)))
Evaluation count : 33840 in 60 samples of 564 calls.
Execution time mean : 1.754723 ms
Execution time std-deviation : 29.229644 µs
Execution time lower quantile : 1.709219 ms (2.5%)
Execution time upper quantile : 1.805009 ms (97.5%)
Overhead used : 2.295807 ns
Evaluation count : 22740 in 60 samples of 379 calls.
Execution time mean : 2.559172 ms
Execution time std-deviation : 44.520222 µs
Execution time lower quantile : 2.490201 ms (2.5%)
Execution time upper quantile : 2.657706 ms (97.5%)
Overhead used : 2.295807 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 6.2842 % Variance is slightly inflated by outliers
Evaluation count : 30000 in 60 samples of 500 calls.
Execution time mean : 1.999194 ms
Execution time std-deviation : 25.723647 µs
Execution time lower quantile : 1.962350 ms (2.5%)
Execution time upper quantile : 2.045836 ms (97.5%)
Overhead used : 2.295807 ns
巨大的输入(不包括outer-join-maps
):
(let [xs (map (fn [id] {:id id :val (* 10 id)}) (range 300000))
ys (map (fn [id] {:id id :val (* 20 id)}) (range 100000 400000))]
(assert (= (outer-join :id xs ys)
(outer-join* :id xs ys)
(outer-join-maps xs ys :id)))
(c/bench (outer-join :id xs ys))
(c/bench (outer-join* :id xs ys)))
WARNING: Final GC required 13.371566110062922 % of runtime
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 772.532296 ms
Execution time std-deviation : 12.710681 ms
Execution time lower quantile : 744.832577 ms (2.5%)
Execution time upper quantile : 801.098417 ms (97.5%)
Overhead used : 2.295807 ns
Found 6 outliers in 60 samples (10.0000 %)
low-severe 2 (3.3333 %)
low-mild 3 (5.0000 %)
high-mild 1 (1.6667 %)
Variance from outliers : 5.3156 % Variance is slightly inflated by outliers
WARNING: Final GC required 15.51698960336361 % of runtime
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 949.508151 ms
Execution time std-deviation : 32.952708 ms
Execution time lower quantile : 911.054447 ms (2.5%)
Execution time upper quantile : 1.031623 sec (97.5%)
Overhead used : 2.295807 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe 4 (6.6667 %)
Variance from outliers : 20.6517 % Variance is moderately inflated by outliers
你试过了什么?这在算法上不是一个超级挑战性的问题,并且能够帮助你使用你尝试的方法将会帮助你更好地解决未来的问题,而不仅仅是提供完整的答案。 – amalloy
还有clojure.data/diff,它可以为您带来转型的良好开端。 – ClojureMostly
尽管'clojure.data/diff'不会以OP请求的形式返回结果,但它会返回相同的信息 - 或者至少是相当平凡的信息,以转换为OP所需的信息。我觉得你的评论应该作为一个答案,@Vanessa,因为它回应了OP是一个实例的一般问题。我会为此投票。 (当我来到这个页面时,它回答了我想到的问题。)如果'clojure.data/diff'确实不是这个问题的适当答案,那么我会问一个新问题,并且可以回答。 – Mars