2016-07-15 43 views
-1

我正在使用Python的Python课程,它使用Python 2.7。我正在运行3.5.2。谷歌的Python课程wordcount.py

脚本功能。这是我的练习之一。

#!/usr/bin/python -tt 
# Copyright 2010 Google Inc. 
# Licensed under the Apache License, Version 2.0 
# http://www.apache.org/licenses/LICENSE-2.0 

# Google's Python Class 
# http://code.google.com/edu/languages/google-python-class/ 

"""Wordcount exercise 
Google's Python class 

The main() below is already defined and complete. It calls print_words() 
and print_top() functions which you write. 

1. For the --count flag, implement a print_words(filename) function that counts 
how often each word appears in the text and prints: 
word1 count1 
word2 count2 
... 

Print the above list in order sorted by word (python will sort punctuation to 
come before letters -- that's fine). Store all the words as lowercase, 
so 'The' and 'the' count as the same word. 

2. For the --topcount flag, implement a print_top(filename) which is similar 
to print_words() but which prints just the top 20 most common words sorted 
so the most common word is first, then the next most common, and so on. 

Use str.split() (no arguments) to split on all whitespace. 

Workflow: don't build the whole program at once. Get it to an intermediate 
milestone and print your data structure and sys.exit(0). 
When that's working, try for the next milestone. 

Optional: define a helper function to avoid code duplication inside 
print_words() and print_top(). 

""" 

import sys 

# +++your code here+++ 
# Define print_words(filename) and print_top(filename) functions. 
# You could write a helper utility function that reads a fcd ile 
# and builds and returns a word/count dict for it. 
# Then print_words() and print_top() can just call the utility function. 

### 

def word_count_dict(filename): 
    """Returns a word/count dict for this filename.""" 
    # Utility used by count() and Topcount(). 
    word_count={} #Map each word to its count 
    input_file=open(filename, 'r') 
    for line in input_file: 
    words=line.split() 
    for word in words: 
     word=word.lower() 
     # Special case if we're seeing this word for the first time. 
     if not word in word_count: 
     word_count[word]=1 
     else: 
     word_count[word]=word_count[word] + 1 
    input_file.close() # Not strictly required, but good form. 
    return word_count 

def print_words(filename): 
    """Prints one per line '<word> <count>' sorted by word for the given file.""" 
    word_count=word_count_dict(filename) 
    words=sorted(word_count.keys()) 
    for word in words: 
    print(word,word_count[word]) 

def get_count(word_count_tuple): 
    """Returns the count from a dict word/count tuple -- used for custom sort.""" 
    return word_count_tuple[1] 

def print_top(filename): 
    """Prints the top count listing for the given file.""" 
    word_count=word_count_dict(filename) 

    # Each it is a (word, count) tuple. 
    # Sort the so the big counts are first using key=get_count() to extract count. 
    items=sorted(word_count.items(), key=get_count, reverse=True) 

    # Print the first 20 
    for item in items[:20]: 
    print(item[0], item[1]) 

# This basic command line argument parsing code is provided and 
# calls the print_words() and print_top() functions which you must define. 
def main(): 
    if len(sys.argv) != 3: 
    print('usage: ./wordcount.py {--count | --topcount} file') 
    sys.exit(1) 

    option = sys.argv[1] 
    filename = sys.argv[2] 
    if option == '--count': 
    print_words(filename) 
    elif option == '--topcount': 
    print_top(filename) 
    else: 
    print ('unknown option: ' + option) 
    sys.exit(1) 

if __name__ == '__main__': 
    main() 

这里是我的问题,这当然是不回答:

  1. 哪里是说下面,我不确定什么1+1意思。这是否意味着if the word is not in the list, add it to the list? (word_count[word]=1)?而且,我不明白这是什么意思,它说word_count[word]=word_count[word] + 1

    if not word in word_count: 
        word_count[word]=1 
        else: 
        word_count[word]=word_count[word] + 1 
    
  2. 当它说word_count.keys(),我不知道这是什么做其他比它调用词典中,我们定义的密钥和加载项和值进入。我只是想明白为什么word_count.keys()在那里。

    words=sorted(word_count.keys()) 
    
  3. word_count被重新定义在几个位置,我想知道为什么,而不是创建一个新的变量名称,如word_count1

    word_count={} 
        word_count=word_count_dict(filename) 
        ...and also in places outlined in my 1st question. 
    
  4. 是否if len(sys.argv) != 3:的意思是,如果我的论点是不是3,或者我的人物不是3(例如sys.argv[1]sys.argv[2]sys.argv[3]

谢谢您的帮助!

+0

也许看看python字典是如何工作的。 [Here's](http://www.pythonforbeginners.com/dictionary/how-to-use-dictionaries-in-python/)教程。 – jDo

+1

非常感谢。总的来说,我发现该网站目前非常翔实。非常感激。 –

回答

0
  1. 如果word不在字典中,我们在它的字典中创建一个新条目,并设置该值到1,因为我们到目前为止只发现了1个词。否则,我们从字典中检索旧值,使用+ 1为该值添加1,然后通过重新指定回word_count[word]将其放回到字典条目中。这也可以写成:

    word_count[word] += 1 
    
  2. word_count.keys()返回在word_count字典中的所有键的列表。这是为了使字典的内容可以按字母顺序打印,通过使用sort()。如果你按照这种方式印刷字典,这些字会以不可预知的顺序出现。

  3. 该变量未被重新定义。变量对于每个函数都是局部的,因此每个word_count是一个不同的变量。他们只是碰巧在每个函数中使用相同的名称,因为它是变量包含的名称。

  4. 列表索引开始0,使if (len(sys.argv) != 3检查你有argv[0]argv[1]argv[2]argv[0]总是包含脚本名称,所以这是检查您给脚本2个参数。第一个参数必须是--count--topcount,并且第二个参数必须是文件名以计算单词中的数字。

+0

非常感谢您的详细见解。你教了我一些东西,我真的很感激它。 –

+0

它说'return word_count_tuple [1]','[]'是否构成一个元组?或者,是什么使它成为一个元组,所以当'print_top'被'--topcount'选项调用时,'item = sorted(word_count.items(),key = get_count,reverse = True输出项可识别列表中的项目[:20]: print(项目[0],项目[1]')中按顺序列出字典中的前20个单词'word_count_dict(filename)'? –

+0

' word_count.items()返回包含字典中每个条目的键和值的元组列表,'key = get_count'告诉'sorted'将这些元组传递给'get_count','[1]'表示返回元组的第二个元素“sorted”然后使用'get_count'返回的值作为排序列表的值。 – Barmar