移植美颜哈希函数Go

我想移植一个最先进的哈希函数MeiYan，从C到Go。（据我所知这是最好的之一，如果不是哈希表在速度和冲突率方面最好的散列函数，它至少击败MurMur。）移植美颜哈希函数Go

我是新来的Go，刚刚花了一个周末与它，并提出了这个版本：

func meiyan(key *byte, count int) uint32 { 
    type P *uint32; 
    var h uint32 = 0x811c9dc5; 
    for ;count >= 8; { 
     a := ((*(*uint32)(unsafe.Pointer(key))) << 5) 
     b := ((*(*uint32)(unsafe.Pointer(key))) >> 27) 
     c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4)) 
     h = (h^((a | b)^c)) * 0xad3e7 
     count -= 8 
     key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8)) 
    } 
    if (count & 4) != 0 { 
     h = (h^uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7 
     key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2)) 
     h = (h^uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7 
     key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2)) 
    } 
    if (count & 2) != 0 { 
     h = (h^uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7 
     key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2)) 
    } 
    if (count & 1) != 0 { 
     h = (h^uint32(*key)); 
     h = h * 0xad3e7 
    } 
    return h^(h >> 16); 
}

看起来很凌乱，但我不认为我可以让它看起来更好。现在我测量速度，速度令人沮丧，比使用gccgo -O3进行编译时比C/C++慢3倍。这可以做得更快吗？这是否与编译器能够做到的一样好或者unsafe.Pointer转换速度如此慢？实际上，这令我感到惊讶，因为我已经看到一些其他数字处理风格的代码与C一样快，甚至更快。我在这里做一些有益的事情吗？

这里是原来的C代码，我从移植：

u32 meiyan(const char *key, int count) { 
    typedef u32* P; 
    u32 h = 0x811c9dc5; 
    while (count >= 8) { 
     h = (h^((((*(P)key) << 5) | ((*(P)key) >> 27))^*(P)(key + 4))) * 0xad3e7; 
     count -= 8; 
     key += 8; 
    } 
    #define tmp h = (h^*(u16*)key) * 0xad3e7; key += 2; 
    if (count & 4) { tmp tmp } 
    if (count & 2) { tmp } 
    if (count & 1) { h = (h^*key) * 0xad3e7; } 
    #undef tmp 
    return h^(h >> 16); 
}

这是我如何测量速度：

func main(){ 
    T := time.Now().UnixNano()/1e6 
    buf := []byte("Hello World!") 
    var controlSum uint64 = 0 
    for x := 123; x < 1e8; x++ { 
     controlSum += uint64(meiyan(&buf[0], 12)) 
    } 
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms") 
    fmt.Println("controlSum:", controlSum) 
}

来源

2017-02-28 exebook

为什么不使用Go基准？ https://golang.org/pkg/testing/#hdr-Benchmarks –

@GrzegorzŻur简单，因为我到目前为止学习了1.5天。 – exebook

为什么你到处使用不安全？ – Flimzy

从NATS实现看起来很不错！在我的机器上，对于长度为30（字节）的数据op/sec 157175656.56和nano-sec/op 6.36！看看它。你可能会发现一些想法。

来源

2017-02-28 13:00:22

我已经将它作为想法的一些源代码提供，并且该包的许可证是MIT - 位于首页的底部 - 并且据我所知，它是最自由的OSS许可证之一（https：// github .com/nats-io/sublist），你甚至可以在商业产品中使用它。 –

@KavehShahbazian昨天我对NATS版本进行了基准测试，它比我从C直接的端口慢，我已经忘记了数字，我认为它慢了3倍左右。它在应该使用指针的地方大量使用片，并且每个索引解引用都是性能杀手。 – exebook

经过一番仔细的研究，我发现，为什么我的代码是缓慢的，并改进它，所以它现在比C版本快是我的测试：

package main 

import (
    "fmt" 
    "time" 
    "unsafe" 
) 

func meiyan(key *byte, count int) uint32 { 
    type un unsafe.Pointer 
    type p32 *uint32 
    type p16 *uint16 
    type p8 *byte 
    var h uint32 = 0x811c9dc5; 
    for ;count >= 8; { 
     a := *p32(un(key)) << 5 
     b := *p32(un(key)) >> 27 
     c := *p32(un(uintptr(un(key)) + 4)) 
     h = (h^((a | b)^c)) * 0xad3e7 
     count -= 8 
     key = p8(un(uintptr(un(key)) + 8)) 
    } 
    if (count & 4) != 0 { 
     h = (h^uint32(*p16(un(key)))) * 0xad3e7 
     key = p8(un(uintptr(un(key)) + 2)) 
     h = (h^uint32(*p16(un(key)))) * 0xad3e7 
     key = p8(un(uintptr(un(key)) + 2)) 
    } 
    if (count & 2) != 0 { 
     h = (h^uint32(*p16(un(key)))) * 0xad3e7 
     key = p8(un(uintptr(un(key)) + 2)) 
    } 
    if (count & 1) != 0 { 
     h = h^uint32(*key) 
     h = h * 0xad3e7 
    } 
    return h^(h >> 16); 
} 

func main() { 
    T := time.Now().UnixNano()/1e6 
    buf := []byte("ABCDEFGHABCDEFGH") 
    var controlSum uint64 = 0 
    start := &buf[0] 
    size := len(buf) 
    for x := 123; x < 1e8; x++ { 
     controlSum += uint64(meiyan(start, size)) 
    } 
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms") 
    fmt.Println("controlSum:", controlSum) 
}

散列函数本身已经是快，但是提领每次迭代时的数组就是使其变慢的原因：&buf[0]被替换为start := &buf[0]，然后在每次迭代中使用start。

来源

2017-03-01 14:21:19 exebook

移植美颜哈希函数Go

回答

相关问题