2017-09-25 83 views
0

我写levenshtein_distance如下来计算两个串之间的距离:levenshtein_distance在红宝石

def min3(a, b, c) 
     if a < b && a < c then 
     a 
     elsif b < c then 
     b 
     else 
     c 
     end 
    end 

    def levenshtein_distance(s, t) 
     m = s.length 
     n = t.length 
     return m if n.zero? 
     return n if m.zero? 

     d = (0..m+1).to_a 
     x = nil 

     s.each_char.each_with_index do |ch1, i| 
     e = i + 1 
     t.each_char.each_with_index do |ch2, j| 
      cost = ch1 == ch2 ? 0 : 1 
      x = min3(d[j + 1] + 1, e + 1, d[j] + cost) 
      d[j] = e 
      e = x 
     end 
     d[m] = x 
     end 
     x 
    end 

当两个字符串是不同的,它提供了一个错误消息:

NoMethodError - undefined method `+' for nil:NilClass 

误差检测线:

x = min3(d[j + 1] + 1, e + 1, d[j] + cost) 

我认为这是由于指数超过了定义的d的限制。但是,放大d的长度并不能解决此问题。

有什么我错过了实现算法?

这是我在irb

irb(main):052:0> levenshtein_distance("a", "abc") 
NoMethodError: undefined method `+' for nil:NilClass 
    from (irb):24:in `block (2 levels) in levenshtein_distance' 
    from (irb):22:in `each_char' 
    from (irb):22:in `each_with_index' 
    from (irb):22:in `block in levenshtein_distance' 
    from (irb):20:in `each_char' 
    from (irb):20:in `each_with_index' 
    from (irb):20:in `levenshtein_distance' 
    from (irb):52 
    from /usr/bin/irb:12:in `<main>' 
+1

不确定,但也许检查这些参考? https://github.com/gstragand/Levenshtein http://rosettacode.org/wiki/Levenshtein_distance#Ruby – lacostenycoder

+0

我也运行你的代码,没有错误。你可以发布你的测试用例来重现吗? – lacostenycoder

+0

我发布了测试用例@lacostenycoder –

回答

0

我已经重写按照算法维基百科测试的情况:

def ld(s, t) 
    v0 = (0..t.length).to_a 
    v1 = [] 
    #p v0 

    s.chars.each_with_index do |s_ch, i|                        
    v1[0] = i + 1                             

    t.chars.each_with_index do |t_ch, j|                       
     cost = s_ch == t_ch ? 0 : 1                         
     v1[j + 1] = [v1[j] + 1, v0[j + 1] + 1, v0[j] + cost].min                  
    end                               
    v0 = v1.dup                             
    #p v1                               
    end                                

    v0[t.length] 
end 

它似乎工作。另外,您也可以取消p v1p v0的注释以查看向量在每次迭代中如何变化。