如何进一步随机化此文本生成器？

我正在研究一种随机文本生成器 - 无需使用马尔可夫链 - 目前它的工作原理没有太多问题 - 实际上按照我的标准生成了大量的随机语句，但是我想使它更加准确，以防止尽可能多的句子尽可能重复 - 。首先，这里是我的代码流：如何进一步随机化此文本生成器？

输入一个句子作为输入 - 这被称为触发字符串，被分配到一个可变
获取触发字符串
搜索最长的单词所有包含该词语的句子的古腾堡项目数据库 - 不含大写小写字母 -
返回含有我在第3步中谈到的词的最长句子
追加语句在步骤1和步骤4一起
分配句子在步骤4中作为新的“触发”句子和重复该过程。请注意，我必须得到最长的单词在第二句，并继续像等等 -

这里是我的代码：

import nltk 

from nltk.corpus import gutenberg 

from random import choice 

import smtplib #will be for send e-mail option later 

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str 

longestLength = 0 

longestString = "" 

longestLen2 = 0 

longestStr2 = "" 

listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format- 

listOfWords = gutenberg.words()# all words in gutenberg books -list format- 

while triggerSentence:#run the loop so long as there is a trigger sentence 
    sets = [] 
    sets2 = [] 
    split_str = triggerSentence.split()#split the sentence into words 

    #code to find the longest word in the trigger sentence input 
    for piece in split_str: 
     if len(piece) > longestLength: 
      longestString = piece 
      longestLength = len(piece) 





    #code to get the sentences containing the longest word, then selecting 
    #random one of these sentences that are longer than 40 characters 

    for sentence in listOfSents: 
     if sentence.count(longestString): 
      sents= " ".join(sentence) 
      if len(sents) > 40: 
       sets.append(" ".join(sentence)) 


    triggerSentence = choice(sets) 
    print triggerSentence #the first sentence that comes up after I enter input- 
    split_str = triggerSentence.split() 

    for apiece in triggerSentence: #find the longest word in this new sentence 
     if len(apiece) > longestLen2: 
      longestStr2 = piece 
      longestLen2 = len(apiece) 
    if longestStr2 == longestString: 
     second_longest = sorted(split_str, key=len)[-2]#this should return the second longest word in the sentence in case it's longest word is as same as the longest word of last sentence 
    #print second_longest #now get second longest word if first is same 
      #as longest word in previous sentence 

     for sentence in listOfSents: 
      if sentence.count(second_longest): 
       sents = " ".join(sentence) 
       if len(sents) > 40: 
        sets2.append(" ".join(sentence)) 
     triggerSentence = choice(sets2) 
    else: 
     for sentence in listOfSents: 
      if sentence.count(longestStr2): 
       sents = " ".join(sentence) 
       if len(sents) > 40: 
       sets.append(" ".join(sentence)) 
     triggerSentence = choice(sets) 


    print triggerSentence

根据我的代码，一旦我进入一个触发器句子，我应该得到另一个包含我输入的触发句子中最长的单词。然后，这个新句子成为触发句，它是最长的词被选中。这是有时出现问题的地方。我观察到，尽管我放置了代码行 - 从第47行开始到结尾，算法仍然可以在出现的句子中选择最长的单词，而不是查找第二长的单词。

例如：

触发字符串= “苏格兰是一个不错的地方。”

语句1 = - 这是内─

现在随机句字苏格兰，这是我在时间的代码可能发生的问题-doesn't也罢，它在句子2登场或942或zillion或任何，但我给它发送.2举例来说 -

句子2 =另一个句子，其中有苏格兰词，但不是第1句中第二长的词。根据我的代码，这个句子应该是一些句子，包含句子1中第二长的单词，而不是苏格兰！

我该如何解决这个问题？我试图尽可能优化代码，并欢迎任何帮助。

来源

2010-08-29 mojave_ranger

一开始所有换行符都是什么？ – 2010-08-29 22:19:14

@ Zonda333，换行符？ – 2010-08-29 22:29:43

@ Zonda333，哦，如果你的意思是为什么开始时在代码之间总是有空行--in import nltk等等，我故意这样做。在复制/粘贴代码的过程中，线条往往会混在一起，我可能会在线条之间按下输入按钮的位置太多。 – 2010-08-29 22:32:43

根本就没有什么随机的算法。它应该始终是确定性的。

我不太清楚你想在这里做什么。如果是生成随机单词，只需使用字典和随机模块。如果您想从古腾堡项目中抓取随机句子，可以使用随机模块选择一项作品，然后从该作品中选择一个句子。

来源

2010-08-30 21:34:27 aterrel

如何进一步随机化此文本生成器？

回答

相关问题