2017-04-20 18 views
1

我正在研究一个基于Jaccard距离的程序,并且我需要计算两个二进制位向量之间的Jaccard距离。我碰到下面就在网上:为什么我们在计算二进制数字之间的jaccard距离时不包含0个匹配项?

If p1 = 10111 and p2 = 10011, 

The total number of each combination attributes for p1 and p2: 

M11 = total number of attributes where p1 & p2 have a value 1, 
M01 = total number of attributes where p1 has a value 0 & p2 has a value 1, 
M10 = total number of attributes where p1 has a value 1 & p2 has a value 0, 
M00 = total number of attributes where p1 & p2 have a value 0. 
Jaccard similarity coefficient = J = 
intersection/union = M11/(M01 + M10 + M11) 
= 3/(0 + 1 + 3) = 3/4, 

Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4, 
Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11) 
= (0 + 1)/(0 + 1 + 3) = 1/4 

现在,在计算系数,为什么“M00”不包括在分母?任何人都可以解释吗?

+1

你不仅可以在网上碰到过这样的片断,还就在这里:http://stackoverflow.com/a/19969874/14955 – Thilo

回答

0

A和B的提花指数是|A∩B|/|A∪B| = |A∩B| /(| A | + | B | - |A∩B|)。

我们有:|A∩B| = M11,| A | = M11 + M10,| B | = M11 + M01。 (| A | + | B | - |A∩B|)= M11 /(M11 + M10 + M11 + M01-M11)= M11 /(M10 + M01 + M11)

So |A∩B| 。

这维恩图可​​以帮助: enter image description here

+0

我很抱歉,但我无法理解这一点: | A | = M11 + M10,| B | = M11 + M01 你能解释一下你是如何抵达这个的吗? –

+0

M11是A和B中出现的位数。M10是A中出现的位数,但不出现在B中。它们的和是什么? –

+0

如果你用代数或组合方式思考,绘制维恩图可能会有所帮助。 –

相关问题