2016-06-09 61 views
-2

我有这个脚本,它读取文件(文件包含收集的推文),清理它,获取频率分布并创建情节,但现在我只能用一个文件工作,我需要什么是从它创建功能,以便能够传递更多的文件。所以我可以创建数据结果从freqdist更多的文件来绘制它如何在Python中编写函数

f = open(.......) 
text = f.read() 
text = text.lower() 
for p in list(punctuation): 
    text = (text.replace(p, '')) 

allWords = nltk.tokenize.word_tokenize(text) 
allWordDist = nltk.FreqDist(w.lower() for w in allWords) 
stopwords = set(stopwords.words('english')) 

allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords) 
mostCommon = allWordExceptStopDist.most_common(25) 

frame = pd.DataFrame(mostCommon, columns=['word', 'frequency']) 
frame.set_index('word', inplace=True) 
print(frame) 
histog = frame.plot(kind='barh') 
plt.show() 

非常感谢您的任何帮助!

+3

所以你问“我该怎么做一个功能”? [你在这里](https://docs.python.org/3/tutorial/controlflow.html#defining-functions)。 – Kevin

+0

基本上是的,我不知道如何编写它在函数 –

+0

所以你的问题是在Python中写一个函数,它与文件读取,数据框或绘图无关。 – Eular

回答

-1

这是你的意思?

def readStuff(filename) 
    with open(filename) as f: 
     text = f.read() 
    text = text.lower() 
    for p in list(punctuation): 
     text = (text.replace(p, '')) 

    allWords = nltk.tokenize.word_tokenize(text) 
    allWordDist = nltk.FreqDist(w.lower() for w in allWords) 
    stopwords = set(stopwords.words('english')) 

    allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords) 
    mostCommon = allWordExceptStopDist.most_common(25) 

    frame = pd.DataFrame(mostCommon, columns=['word', 'frequency']) 
    frame.set_index('word', inplace=True) 
    print(frame) 
    histog = frame.plot(kind='barh') 
    plt.show() 
+0

我想是的,谢谢! –

+0

不要忘记标记它是正确的,所以人们不知道再回答你的问题:) – Brian

+0

这泄漏文件句柄,你应该使用'与开放(文件名)作为f:...' – Daenyth