2017-04-14 25 views
4


输入:鉴于所有连续两个词的文本数出现

Once upon a time a time this upon a 


输出:

dictionary { 
    'Once upon': 1, 
     'upon a': 2, 
     'a time': 2, 
     'time a': 1, 
    'time this': 1, 
    'this upon': 1 
} 


CODE:

def countTuples(path): 
    dic = dict() 
    with codecs.open(path, 'r', 'utf-8') as f: 
     for line in f: 
      s = line.split() 
      for i in range (0, len(s)-1): 
       dic[str(s[i]) + ' ' + str(s[i+1])] += 1 
    return dic 

我收到此错误:

File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples 
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1 
TypeError: list indices must be integers or slices, not str 

如果我删除+=和公正的地方=1一切正常就好了,我想这个问题是,当我尝试访问的条目提取没有按”值现在还存在吗?

我能做些什么来解决这个问题?要求你的代码的最小变化

+0

使用“计数器”... –

+0

计数器将对文件上的每个元组进行迭代。我承受不起,时间复杂度将是n^2,我想避免这种情况。 @WillemVanOnsem –

+1

如果您使用'zip',则不需要... –

回答

3

您可以使用defaultdict来使您的解决方案正常工作。使用defaultdict,可以指定键值对的默认值类型。这使您可以像+=1的分配尚未明确创建的关键,但:

import codecs 
from collections import defaultdict 

def countTuples(path): 
    dic = defaultdict(int) 
    with codecs.open(path, 'r', 'utf-8') as f: 
     for line in f: 
      s = line.split() 
      for i in range (0, len(s)-1): 
       dic[str(s[i]) + ' ' + str(s[i+1])] += 1 
    return dic 

>>> {'Once upon': 1, 
    'a time': 2, 
    'this upon': 1, 
    'time a': 1, 
    'time this': 1, 
    'upon a': 2}) 
2

一种解决方案是只使用一个defaultdict

from collections import defaultdict 

line = 'Once upon a time a time this upon a' 

dic = defaultdict(int) 

s = line.split() 

for i in range(0, len(s)-1): 
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1 

这将产生:然后

dic 

defaultdict(int, 
      {'Once upon': 1, 
      'a time': 2, 
      'this upon': 1, 
      'time a': 1, 
      'time this': 1, 
      'upon a': 2}) 

你的函数只是变成了:

def countTuples(path): 
    dic = defaultdict(int) 
    with codecs.open(path, 'r', 'utf-8') as f: 
     for line in f: 
      s = line.split() 
      for i in range (0, len(s)-1): 
       dic[str(s[i]) + ' ' + str(s[i+1])] += 1 
    return dic 
2

没有必要弄得这么辛苦,简单地用一个Counter和使用zip以双字母组喂柜台,如:

from collections import Counter 

def countTuples(path): 
    dic = Counter() 
    with codecs.open(path, 'r', 'utf-8') as f 
     for line in f: 
      s = line.split() 
      dic.update('%s %s'%t for t in zip(s,s[1:])) 
    return dic