2015-03-02 38 views
0

我想开发功能来创建2个数组上的联合来比较大型文本。例如:创建阵列联盟高级taxt比较喜欢Jaccard索引

$myText1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam quis eleifend sem. Maecenas at elit varius erat malesuada mollis. Nullam vulputate, velit vel posuere finibus, ex quam imperdiet lorem, eu gravida quam purus et augue. Nunc dictum nunc vehicula mattis mollis. Quisque pharetra lorem id ultrices feugiat. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Suspendisse sed tortor odio. Proin consequat sed purus quis congue. Donec faucibus nec magna ac sodales." 
$myText2="Mauris eget metus non mi tristique varius. Ut suscipit ante id condimentum interdum. Pellentesque egestas, leo quis tincidunt ornare, nulla metus sagittis lorem, in interdum justo tellus ac lectus. Vivamus auctor bibendum eros vel cursus. Praesent semper mauris dolor, sit amet placerat orci vestibulum non. Aenean consequat ultrices massa, in congue urna condimentum nec. Pellentesque eu faucibus dolor. Ut eu accumsan nunc, vel egestas dolor. Sed convallis mi et orci interdum tincidunt." 
$str1array = explode(' ', $myText1); 
$str2array = explode(' ', $myText2); 
$dallas = array(); 
foreach($str1array as $str){ 
    $dallas[] = trim($str, " .,;\t\n\r\0\x0B"); 
} 
$phoenix = array(); 
foreach($str2array as $str){ 
    $phoenix[] = trim($str, " .,;\t\n\r\0\x0B"); 
} 
$inter = array_intersect($dallas, $phoenix); 
$union = array_unique(array_merge($str1array, $str2array)); 
$indiceDeJaccard = count($inter)/count($union); 
$coefficientDeDice = (2*count($inter))/(count($str1array)+count($str2array)); 

但是,array_unique的问题是压缩原始文本中的所有重复内容并创建用于计算Jaccard索引的错误数据。

回答

0

对不起所有,但我有回答我的问题:

$inter = array_intersect($dallas, $phoenix); 
$diff = array_diff($dallas, $phoenix); 
$union = array_merge($inter, $diff);