将一个没有空格的字符串解析成一个单独的单词数组

如果我有一个字符串“blueberrymuffinsareinsanelydelicious”，解析它的最有效方法是让我剩下[“blueberry”，“muffins”，“are” ，“疯狂”，“美味”]？将一个没有空格的字符串解析成一个单独的单词数组

我已经有我的词汇表（mac的/ usr/share/dict/words），但是如何确保整个单词存储在我的数组中，又名：蓝莓，而不是两个单独的词，蓝色和浆果。

来源

2017-01-17 Chris Lai

您可以先阅读您的单词列表，然后按照**反向排列的**字大小**对单词列表进行排序，然后再搜索字符串。在这种情况下，如果你的单词列表是'['blue'，'berry'，'blueberry']，那么它会变成'['blueberry'，'berry'，'blue']'首先为复合词。 –

这听起来像一个[XY问题]（http://xyproblem.info）。你如何得到这个列表？ –

为什么最好分析'blueberry'而不是'blue'和'berry'？这也不是一个有效的解决方案吗？ – phss

尽管有情况有多种解释可能和选择最好的一个可麻烦，你可以随时与一个相当天真的算法是这样来解决：

WORDS = %w[ 
    blueberry 
    blue 
    berry 
    fin 
    fins 
    muffin 
    muffins 
    are 
    insane 
    insanely 
    in 
    delicious 
    deli 
    us 
].sort_by do |word| 
    [ -word.length, word ] 
end 

WORD_REGEXP = Regexp.union(*WORDS) 

def best_fit(string) 
    string.scan(WORD_REGEXP) 
end

这将解析您的例子：

best_fit("blueberrymuffinsareinsanelydelicious") 
# => ["blueberry", "muffins", "are", "insanely", "delicious"]

请注意，这会跳过所有不匹配的组件。

来源

2017-01-17 20:39:53 tadman

我认为从一个给定的起点开始保持所有可能的成功匹配是明智的。你可能发现自己处于两种可能匹配的较长匹配是错误的情况，甚至下一个词也不能完全消除结果的歧义。这可能是因为直到走了几步才能解决歧义（如果有的话）。 –

@DerrellDurrett从概念上讲，这不是一个坏主意，但它确实使解决方案显得复杂化。 – tadman

大声笑。对。但是这是实施回溯的唯一（明显的）方法，所以如果你做出了不好的选择，你就可以成功。否则，你只会失败很多时间。 –

下面是一个递归方法，它在慢速笔记本电脑上查找0.4秒内的正确句子。

它首先进口近10万英语单词和通过减少尺寸
每word对它们进行分类，它会检查是否text开始用它
如果是这样，它会从text的word，保持word在数组中并递归调用它自己。
如果text为空，则表示已找到句子。
它使用一个惰性数组停止在第一个找到的句子。

text = "blueberrymuffinsareinsanelydeliciousbecausethey'rereallymoistandcolorful" 

dictionary = File.readlines('/usr/share/dict/american-english') 
       .map(&:chomp) 
       .sort_by{ |w| -w.size } 

def find_words(text, possible_words, sentence = []) 
    return sentence if text.empty? 
    possible_words.lazy.select{ |word| 
    text.start_with?(word) 
    }.map{ |word| 
    find_words(text[word.size..-1], possible_words, sentence + [word]) 
    }.find(&:itself) 
end 

p find_words(text, dictionary) 
#=> ["blueberry", "muffins", "are", "insanely", "delicious", "because", "they're", "really", "moist", "and", "colorful"] 
p find_words('someword', %w(no way to find a combination)) 
#=> nil 
p find_words('culdesac', %w(culd no way to find a combination cul de sac)) 
#=> ["cul", "de", "sac"] 
p find_words("carrotate", dictionary) 
#=> ["carrot", "ate"]

要获得更快的查找，也可能是使用Trie一个好主意。

来源

2017-01-17 21:59:41

将一个没有空格的字符串解析成一个单独的单词数组

回答

相关问题