在PHP中检查两个字符串的近似匹配

这是我使用的标准。

1）单词的顺序很重要 2）单词可以有80％的相似性。

实施例：

$string1 = "How much will it cost to me" //string in vocabulary (all "right" words is here) 
$string2 = "How much does costs it " //"costs" instead "cost" -is a deliberate mistake (user input);

Algoritm： 1）检查字的相似性，并创建清洁字符串“右”的话（根据它出现在词汇中的顺序）。输出：“多少钱费用” 2）创建干净的字符串与“正确”的话，以便它出现在用户输入。输出：“多少成本” 3）比较两个输出 - 如果不相同 - 返回否，否则如果相同返回是。

有什么建议吗？我开始编写代码，但我不熟悉PHP中的工具，所以我不知道如何理性和有效地做到这一点。

它看上去更像是使用Javascript/PHP的

$string1="how much will it cost for me" ; 
$string2= "how much does costs it"; 

function compareStrings($string1, $string2) { 

    if (strlen($s1)==0 || strlen($s2)==0) { 
     return 0; 
    } 

    while (strpos($s1, " ")!==false) { 
     $s1 = str_replace(" ", " ", $s1); 
    } 
    while (strpos($s2, " ")!==false) { 
     $s2 = str_replace(" ", " ", $s2); 
    } 

    $ar1 = explode(" ",$s1); 
    $ar2 = explode(" ",$s2); 
    $array1 = array_flip($ar1); 
    $array2 = array_flip($ar2); 
    $l1 = count($ar1); 
    $l2 = count($ar2); 

$meaning=""; 
    $rightorder="" 

    for ($i=0;$i<=$l1;$i++) { 


     for ($j=0;$j<=$l2;$j++) { 

     $k= similar_text($array1[i], $array2[j], $perc).PHP_EOL; 
if ($perc>=85) { 
    $meaning=$meaning." ".$array1[j]; //generating a string of the first output 
    $rightorder[i]= array1[i]; //generating the array with second output 

} 

     } 


    } 

}

的想法泰德的$意义将得到 “多少它的成本” 美元再经过rightorder将获得

$rightorder[0]='how' 
$rightorder[1]='much' 
$rightorder[2]='' 
$rightorder[3]='cost' 
$rightorder[4]='it'

我会以某种方式反转回字符串“多少成本”

并比较这两个。

if ("how much cost it"=="how much it cost") return true; else return false.

来源

2013-05-14 Ilya Libin

查看[levenshtein（）]（http://php.net/manual/en/function.levenshtein.php）和[similar_text（）]（http://www.php.net/manual/en /function.similar-text。PHP）提供的功能，它们可能适合账单。 – hexblot 2013-05-14 13:20:43

不确定... – 2013-05-14 13:23:20

另外[soundex]（http://php.net/manual/en/function.soundex.php） – 2013-05-14 13:23:51

您的问题属于NLP（自然语言处理）科学。

在问题中提到的每个问题有一个提交了自己的研究：

将字符串分割成单词是tokenization。这似乎在英语中是微不足道的，但它不像其他语言，如德语。还有一个如何解析标点符号的问题。
创建“正确的词”被称为词干。有很多工具可以做到这一点。如果你的文字是英文的，你可以尝试Porter Stemming Algorithm。其他语言可能有自己的词干技术，通常存在字典算法。
根据单词出现次数来计算字符串的相似度被称为“Cosine Similarity”。还有其他一些技巧。有ALSE问题OD synonymy和polysemy

我希望这有助于为你的问题是上面提到的问题的混合物。

来源

2013-05-14 14:13:03 hegemon

是的，我知道什么是NLP，但我不想深入它。这是我的简化解决方案（适用于拉丁语言） – 2013-05-14 15:07:02

在PHP中检查两个字符串的近似匹配

回答

相关问题