检查两个哈希是否具有相同的一组密钥

检查两个哈希h1和h2是否具有相同的一组密钥而不考虑顺序的最有效方法是什么？能否以比我发布的答案更快，更简洁，更有效率？检查两个哈希是否具有相同的一组密钥

2012-12-09 sawa

你用'h1.keys.sort == h2.keys.sort'比较了吗？ –

我做了一个有限的例子。 'h1.keys.sort == h2.keys.sort'有点慢。但我不确定一般情况是否如此。 – sawa

我想你应该在问题中提到这一点。我也会将解决方案作为问题的一部分发布，而不是作为答案。 –

尝试：

# Check that both hash have the same number of entries first before anything 
if h1.size == h2.size 
    # breaks from iteration and returns 'false' as soon as there is a mismatched key 
    # otherwise returns true 
    h1.keys.all?{ |key| !!h2[key] } 
end

Enumerable#all?

最坏情况下，你只能通过迭代键一次。

来源

2012-12-09 11:29:15 bitstrider

更好，'h2.include？（key）'。 – akuhn

我做了一些基准测试，看来这个答案到目前为止是明显的赢家。使用'Hash＃include？'不会带来任何性能上的改进，但它在可读性方面肯定是向前迈出的一大步。 – Jan

'if a then b end' - >'a && b' – tokland

这是我的尝试：

(h1.keys - h2.keys).empty? and (h2.keys - h1.keys).empty?

来源

2012-12-09 10:15:36 sawa

结合freemasonjson's和sawa's想法：

h1.size == h2.size and (h1.keys - h2.keys).empty?

来源

2012-12-09 11:39:03 Jan

这也很有趣。 – sawa

这取决于你的数据。

没有一般情况下真的。例如，通常一次检索整个键集比单独检查每个键的包含更快。但是，如果在您的数据集中，这些键集的差别往往比较小，那么速度较慢的较慢解决方案可能会更快。例如：

h1.size == h2.size and h1.keys.all?{|k|h2.include?(k)}

要考虑的另一个因素是散列的大小。如果他们是大具有更高的设置成本的解决方案，比如调用Set.new，可能还清，但如果他们是小，也不会：

h1.size == h2.size and Set.new(h1.keys) == Set.new(h2.keys)

如果你碰巧在同一不变的散列一遍遍比较再次，它肯定会收回缓存结果。

最终只有一个基准测试会告诉我们，但是要编写一个基准测试，我们需要更多地了解您的使用案例。当然，使用合成数据（例如，随机生成的密钥）测试解决方案将不具有代表性。

来源

2012-12-09 12:03:58 akuhn

不仅仅是因为至少在这个问题上的基准着想......

require 'securerandom' 
require 'benchmark' 

a = {} 
b = {} 

# Use uuid to get a unique random key 
(0..1_000).each do |i| 
    key = SecureRandom.uuid 
    a[key] = i 
    b[key] = i 
end 

Benchmark.bmbm do |x| 
    x.report("#-") do 
    1_000.times do 
     (a.keys - b.keys).empty? and (a.keys - b.keys).empty? 
    end 
    end 

    x.report("#&") do 
    1_000.times do 
     computed = a.keys & b.keys 
     computed.size == a.size 
    end 
    end 

    x.report("#all?") do 
    1_000.times do 
     a.keys.all?{ |key| !!b[key] } 
    end 
    end 

    x.report("#sort") do 
    1_000.times do 
     a_sorted = a.keys.sort 
     b_sorted = b.keys.sort 
     a == b 
    end 
    end 
end

结果是：

Rehearsal ----------------------------------------- 
#-  1.000000 0.000000 1.000000 ( 1.001348) 
#&  0.560000 0.000000 0.560000 ( 0.563523) 
#all? 0.240000 0.000000 0.240000 ( 0.239058) 
#sort 0.850000 0.010000 0.860000 ( 0.854839) 
-------------------------------- total: 2.660000sec 

      user  system  total  real 
#-  0.980000 0.000000 0.980000 ( 0.976698) 
#&  0.560000 0.000000 0.560000 ( 0.559592) 
#all? 0.250000 0.000000 0.250000 ( 0.251128) 
#sort 0.860000 0.000000 0.860000 ( 0.862857)

我有@akuhn同意，这将是一个更好的基准，如果我们有更多关于你正在使用的数据集的信息。但这就是说，我认为这个问题确实需要一些事实。

来源

2012-12-09 12:26:59

我建议将'benchmark'方法的基准名称作为参数添加。这样可以将名称添加到结果报告中，使其更易于阅读。 –

好的，让我们打破所有规则精灵和便携性。 MRI的C API开始发挥作用。

/* Name this file superhash.c. An appropriate Makefile is attached below. */ 
#include <ruby/ruby.h> 

static int key_is_in_other(VALUE key, VALUE val, VALUE data) { 
    struct st_table *other = ((struct st_table**) data)[0]; 
    if (st_lookup(other, key, 0)) { 
    return ST_CONTINUE; 
    } else { 
    int *failed = ((int**) data)[1]; 
    *failed = 1; 
    return ST_STOP; 
    } 
} 

static VALUE hash_size(VALUE hash) { 
    if (!RHASH(hash)->ntbl) 
    return INT2FIX(0); 
    return INT2FIX(RHASH(hash)->ntbl->num_entries); 
} 

static VALUE same_keys(VALUE self, VALUE other) { 
    if (CLASS_OF(other) != rb_cHash) 
    rb_raise(rb_eArgError, "argument needs to be a hash"); 
    if (hash_size(self) != hash_size(other)) 
    return Qfalse; 
    if (!RHASH(other)->ntbl && !RHASH(other)->ntbl) 
    return Qtrue; 
    int failed = 0; 
    void *data[2] = { RHASH(other)->ntbl, &failed }; 
    rb_hash_foreach(self, key_is_in_other, (VALUE) data); 
    return failed ? Qfalse : Qtrue; 
} 

void Init_superhash(void) { 
    rb_define_method(rb_cHash, "same_keys?", same_keys, 1); 
}

这里是一个Makefile。

CFLAGS=-std=c99 -O2 -Wall -fPIC $(shell pkg-config ruby-1.9 --cflags) 
LDFLAGS=-Wl,-O1,--as-needed $(shell pkg-config ruby-1.9 --libs) 
superhash.so: superhash.o 
    $(LINK.c) -shared $^ -o [email protected]

一个人为的，合成的和简单的基准测试显示了以下内容。

require 'superhash' 
require 'benchmark' 
n = 100_000 
h1 = h2 = {a:5, b:8, c:1, d:9} 
Benchmark.bm do |b| 
    # freemasonjson's state of the art. 
    b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}} 
    # This solution 
    b.report { n.times { h1.same_keys? h2} } 
end 
#  user  system  total  real 
# 0.310000 0.000000 0.310000 ( 0.312249) 
# 0.050000 0.000000 0.050000 ( 0.051807)

来源

2012-12-09 12:48:40 Jan

干得好，先生！ – akuhn

哇这是真棒！我必须回去知道C – bitstrider

感谢您的细节。 – sawa

检查两个哈希是否具有相同的一组密钥

回答

相关问题