Php脚本来查找/识别域名中的单词

-5

我正在寻找一个可以识别域名中的单词的php代码/脚本。Php脚本来查找/识别域名中的单词

例如，当用户查询域名snapnames.com - 这个脚本会显示SnapNames.com（认识到在这一领域的2个字：快名称）

希望有人能帮助

感谢

来源

2012-01-23 Mai Reynolds

这不是一个编程问题 – zerkms

我想知道你的脚本会在域名expertsexchange.com上做什么。 – sarnold

英语是一种非常含糊和语境的语言。没有电脑可以完全取代流利英语的人的技能。 –

恐怕没有完美的答案......正如阿诺德所说，像“expertsexbhange.com”这样的域名可以评估为“Expert Sex Change.com”，也可以评估为“Expert Exchange.com”。

不仅如此，这样的功能在内存和处理能力上会相当密集。你需要有巨大的文件能够识别所有的单词，等等。知道你为什么需要这个，这是很好的 - 所以试图找到一个不同的解决方案。

如果您有某种显示有关网站信息的服务，则显示“Snapnames.com”是完全可以接受的。没有必要利用它，或者类似的东西。

但是，如果你一意孤行，并确定了这种行为，即使不是100％准确，您的服务器上，而激烈的...

你首先需要找到一种方法来检查如果一个字符串是一个字。这是一个完全独立的问题，具有完全合理的答案。您需要单独询问，看看您是否可以找到PHP的字典库。

基本上，通过您的字符串向后迭代，直到它成为一个单词，从字符串中删除该单词，然后重复。例如：

expertsexchange.com，你会检查它像这样：

第{}是你的词汇列表foubnd。第一“”是你已经离开检查的最后一个字母“”是字母当前子集要检查

{} "expertsexchange" "expertsexchange" <-- not a word 
{} "expertsexchange" "expertsexchang" <-- not a word 
{} "expertsexchange" "expertsexchan" <-- not a word 
{} "expertsexchange" "expertsexcha" <-- not a word 
{} "expertsexchange" "expertsexch" <-- not a word 
{} "expertsexchange" "expertsexc" <-- not a word 
{} "expertsexchange" "expertsex" <-- not a word 
{} "expertsexchange" "expertse" <-- not a word 
{} "expertsexchange" "experts" <-- WORD! Add it to our list of words 
{"experts"} "exchange" "exchange" <-- WORD! Add it to our list of words 
{"experts", "exchange"} "" "" <-- No more letters to check, we have found all of our words.

让我们尝试不同的例子...

hellotherewittlekitty。这有一个“词”（“wittle”），它不会被字典识别。不幸的是，这是该算法将如何处理：

{} "hellotherewittlekitty" "hellotherewittlekitty" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlekitt" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlekit" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittleki" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlek" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittle" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittl" <-- not a word 
{} "hellotherewittlekitty" "hellotherewitt" <-- not a word 
{} "hellotherewittlekitty" "hellotherewit" <-- not a word 
{} "hellotherewittlekitty" "hellotherewi" <-- not a word 
{} "hellotherewittlekitty" "hellotherew" <-- not a word 
{} "hellotherewittlekitty" "hellothere" <-- not a word 
{} "hellotherewittlekitty" "hellother" <-- not a word 
{} "hellotherewittlekitty" "hellothe" <-- not a word 
{} "hellotherewittlekitty" "helloth" <-- not a word 
{} "hellotherewittlekitty" "hellot" <-- not a word 
{} "hellotherewittlekitty" "hello" <-- WORD! add it to list, and remove form main string! 
{"hello"} "therewittlekitty" "therewittlekitty" <-- not a word 
{"hello"} "therewittlekitty" "therewittlekitt" <-- not a word 
{"hello"} "therewittlekitty" "therewittlekit" <-- not a word 
{"hello"} "therewittlekitty" "therewittleki" <-- not a word 
{"hello"} "therewittlekitty" "therewittlek" <-- not a word 
{"hello"} "therewittlekitty" "therewittle" <-- not a word 
{"hello"} "therewittlekitty" "therewittl" <-- not a word 
{"hello"} "therewittlekitty" "therewitt" <-- not a word 
{"hello"} "therewittlekitty" "therewit" <-- not a word 
{"hello"} "therewittlekitty" "therew" <-- not a word 
{"hello"} "therewittlekitty" "there" <-- WORD! add it to list, and remove from main string 
{"hello", "there"} "wittlekitty" "wittlekitty" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlekitt" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlekit" <-- not a word 
{"hello", "there"} "wittlekitty" "wittleki" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlek" <-- not a word 
{"hello", "there"} "wittlekitty" "wittle" <-- not a word (even though humans read it as one) 
{"hello", "there"} "wittlekitty" "wittl" <-- not a word 
{"hello", "there"} "wittlekitty" "witt" <-- WORD! add to dictionary and remove from string 
{"hello", "there", "witt"} "lekitty" "lekitty" <-- not a word 
{"hello", "there", "witt"} "lekitty" "lekitt" <-- not a word 
{"hello", "there", "witt"} "lekitty" "lekit" <-- not a word 
{"hello", "there", "witt"} "lekitty" "leki" <-- WORD! (biology, wikipedia) 
{"hello", "there", "witt", "leki"} "tty" "tty" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "tt" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "t" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "" <-- No more letters, add it to the list! 
{"hello", "there", "witt", "leki", "tty"} "" ""

因此，hellotherewittlekitty会出来为HelloThereWittLekiTty，这将是更糟的不仅仅是离开它全部用小写。

还有更多的算法在你的CPU上比这更密集，并且需要更多的数据，这可能会为你提供更多的精度。但总而言之，对于所有的工作，准确度只有30％是不值得的。特别是因为当算法失败时，它会毁掉你的话。这意味着添加这将使60％的网站毁于一旦。

来源

2012-01-23 01:49:02

Php脚本来查找/识别域名中的单词

回答

相关问题