2014-02-05 36 views
0

所以我正在尝试使用Python创建一个树,以便能够尝试读取一个文本文件,该文件在文件中有重复的数量,并尝试从这些值中创建一个树并返回具有前3个值的句子(下面更详细地解释)。在python中使用树来获取值

首先,我在wikipedia上搜索了一棵树是如何创建的,并且还看到了像以前的例子:This one。和This one。不过,我只能根据代码执行此操作:

import fileinput 

setPhrasesTree = 0 


class Branch(): 
    def __init__(self, value): 
     self.left = None 
     self.right = None 
     self.value = value 

class Tree(): 
    def __init__(self): 
     self.root = None 
     self.found = False 

    #lessThan function needed to compare strings 
    def lessThan(self, a, b):  
     if len(a) < len(b): 
      loopCount = len(a) 
     else: 
      loopCount = len(b)   
     for pos in range(0, loopCount): 
      if a[pos] > b[pos]: 
       return False   
     return True 

    def insert(self, value): 
     self.root = self.insertAtBranch(self.root, value) 

    def exists(self, value): 
     #set the class variable found to False to assume it is not there  
     self.found = False 
     self.findAtBranch(self.root, value) 
     return self.found 

    #Used to fine a value in a tree 
    def findAtBranch(self, branch, value):   
     if branch == None: 
      pass 
     else: 
      if branch.value == value: 
       self.found = True     
      else: 
       self.findAtBranch(branch.left, value) 
       self.findAtBranch(branch.right, value)   

    def insertAtBranch(self, branch, value): 
     if branch == None: 
      return Branch(value) 
     else: 
      if self.lessThan(branch.value, value): 
       branch.right = self.insertAtBranch(branch.right, value)    
      else: 
       branch.left = self.insertAtBranch(branch.left, value) 
      return branch 

def loadTree(filename, treeType): 

    if treeType == setPhrasesTree: 
     for sentence in fileinput.input("setPhrases.txt"): 
      print(sentence) 
      setPhrases.insert(sentence[:-1]) 


def findSentenceType(sentence): 

    if sentence.exists(sentence): 
     return setPhrasesTree 

以下是文本文件的样子。记住裸,这是有意布置这样,而不是用量值旁边(文件名= setPhrases.txt):

Hi my name is Dave. 
Thank-You. 
What is your name? 
I have done all my homework. 
What time is dinner? 
What is your name? 
Thank-You. 
Hi my name is Dave. 
What is your name? 
I have done all my homework. 
What is your name? 
Can you bring me a drink Please? 
Can you bring me a drink Please? 
What is your name? 
Hi my name is Dave. 
What is your name? 
Can you bring me a drink Please? 

这里就是我试图让我的代码来执行。我需要它来认识到文件中的第一句话是起始节点。然后它需要统计所有其他相同的句子,并为该句子添加一个值,并使用该树来完成此操作。 (我原来做过这种以另一种方式,但是我需要用一棵树能够吻合起来,做所有其他的东西),这是我的意思是: enter image description here

然后我希望能够返回具有最高频率的前3个Phrases。所以在这种情况下,系统将返回句子(按此顺序):

What is your name? 
Hi my name is Dave. 
Can you bring me a drink please? 

任何帮助,非常感谢。也感谢你的时间。

+0

我理解正确吗,你只是想计算每行在文件中出现的频率?你几乎不需要一棵树。 – pentadecagon

+0

@五角大楼正如我前面提到的,我已经能够做到这一点。然而,我需要使用树来做到这一点,我不知道下一步该做什么。 – PythonNovice

+0

需要树吗?所以这是一个练习?可以肯定的是,因为您知道,通过使用字典而不是树,可以在大约20行代码中更高效地解决此问题。如果你真的想要一棵树,那么为了让树有用,它可能应该是某种[自平衡树](http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree),最流行的这里是[红黑树](http://en.wikipedia.org/wiki/Red-black_tree)。这是您自己实施的很多工作。 – pentadecagon

回答

0

在这里,一个使用字典的实现。这是你想要的吗?

import collections 
def count_lines(): 
    d = collections.defaultdict(int) 
    for line in open("phrases.txt"): 
     d[ line.strip() ] += 1 

    # we use the negative count as sort key, so the biggest ends up first 
    a = sorted(d.items(), key=lambda x : -x[1]) 
    for n, u in enumerate(a[:3]): 
     print(u[0], "# count=", u[1]) 

count_lines()