2013-08-07 75 views
0

我有一个文件,其中包含乐队列表以及专辑的制作年份。 我需要编写一个函数来查看这个文件,并找出这些乐队的不同名称,并计算出这些乐队在这个文件中出现的次数。用某些词汇打印每个短语/单词的频率?

文件的样子是这样的:

Beatles - Revolver (1966) 
Nirvana - Nevermind (1991) 
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967) 
U2 - The Joshua Tree (1987) 
Beatles - The Beatles (1968) 
Beatles - Abbey Road (1969) 
Guns N' Roses - Appetite For Destruction (1987) 
Radiohead - Ok Computer (1997) 
Led Zeppelin - Led Zeppelin 4 (1971) 
U2 - Achtung Baby (1991) 
Pink Floyd - Dark Side Of The Moon (1973) 
Michael Jackson -Thriller (1982) 
Rolling Stones - Exile On Main Street (1972) 
Clash - London Calling (1979) 
U2 - All That You Can't Leave Behind (2000) 
Weezer - Pinkerton (1996) 
Radiohead - The Bends (1995) 
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995) 
. 
. 
. 

输出必须是在按频率的降序,看起来像这样:

band1: number1 
band2: number2 
band3: number3 

这里是我到目前为止的代码:

def read_albums(filename) : 

    file = open("albums.txt", "r") 
    bands = {} 
    for line in file : 
     words = line.split() 
     for word in words: 
      if word in '-' : 
       del(words[words.index(word):]) 
     string1 = "" 
     for i in words : 
      list1 = [] 

      string1 = string1 + i + " " 
      list1.append(string1) 
     for k in list1 : 
      if (k in bands) : 
       bands[k] = bands[k] +1 
      else : 
       bands[k] = 1 


    for word in bands : 
     frequency = bands[word] 
     print(word + ":", len(bands)) 

我认为有一个更简单的方法来做到这一点,但我不确定。另外,我不确定如何按频率对字典进行排序,是否需要将其转换为列表?

+1

查看['collections.Counter'](http://docs.python.org/2/library/collections.html#collections。计数器) –

回答

2

你说得对,还有一个更简单的方法,用Counter

from collections import Counter 

with open('bandfile.txt') as f: 
    counts = Counter(line.split('-')[0].strip() for line in f if line) 

for band, count in counts.most_common(): 
    print("{0}:{1}".format(band, count)) 

究竟是什么做的这样:​​ if line

这条线是下面的循环的长型:

temp_list = [] 
for line in f: 
    if line: # this makes sure to skip blank lines 
     bits = line.split('-') 
     temp_list.add(bits[0].strip()) 

counts = Counter(temp_list) 

但是,与上面的循环 - 它不会创建一个中介名单。相反,它会创建一个generator expression--更有效地解决问题的内存方式;它被用作Counter的参数。

+0

请注意'计数器'只适用于2.7及更高版本。如果你使用的东西比那更早,请查看这里接受的答案:http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value –

+0

我还是很新的python,那么with语句做什么?不在此代码中,但总体而言。 –

+0

http://docs.python.org/2/reference/compound_stmts。html## –

1

如果您正在寻找简洁,使用“defaultdict”和“分类”

from collections import defaultdict 
bands = defaultdict(int) 
with open('tmp.txt') as f: 
    for line in f.xreadlines(): 
     band = line.split(' - ')[0] 
     bands[band] += 1 
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True): 
    print '%s: %d' % (band, count) 
+0

为什么要排序?该问题不要求排序输出。请注意'collections.Counter()。most_common()'将会更加简洁,因为它会按照频率为您反向排序。 –

+0

正确;当我写我的时候没有看到Counter解决方案,那更好! – thierrybm

0

我的做法是使用split()方法将文件中的行打入成分标记列表。然后,你可以抓住乐队的名字(在列表中第一个标记),并开始添加名称字典来跟踪计数:

import operator 

def main(): 
    f = open("albums.txt", "rU") 
    band_counts = {} 

    #build a dictionary that adds each band as it is listed, then increments the count for re-lists 
    for line in f: 
    line_items = line.split("-") #break up the line into individual tokens 
    band = line_items[0] 

    #don't want to add newlines to the band list 
    if band == "\n": 
    continue 

    if band in band_counts: 
    band_counts[band] += 1 #band already in the counts, increment the counts 
    else: 
    band_counts[band] = 1 #if the band was not already in counts, add it with a count of 1 

    #create a list of sorted results 
    sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1)) 

    for item in sorted_list: 
    print item[0], ":", item[1] 

注:

  1. 我跟着的建议这个答案创建排序结果:Sort a Python dictionary by value
  2. 如果您是Python的新手,请查看Google的Python类。当我刚刚开始时,我发现它非常有用:https://developers.google.com/edu/python/?csw=1