1
如何让这个python程序更快地读取大文本文件?我的代码花费了将近五分钟的时间来阅读文本文件,但我需要它做得更快。我认为我的算法不在O(n)中。python文本文件读取速度慢
一些样品数据(actual data是470K +行):
Aarika
Aaron
aaron
Aaronic
aaronic
Aaronical
Aaronite
Aaronitic
Aaron's-beard
Aaronsburg
Aaronson
我的代码:
import string
import re
WORDLIST_FILENAME = "words.txt"
def load_words():
wordlist = []
print("Loading word list from file...")
with open(WORDLIST_FILENAME, 'r') as f:
for line in f:
wordlist = wordlist + str.split(line)
print(" ", len(wordlist), "words loaded.")
return wordlist
def find_words(uletters):
wordlist = load_words()
foundList = []
for word in wordlist:
wordl = list(word)
letters = list(uletters)
count = 0
if len(word)==7:
for letter in wordl[:]:
if letter in letters:
wordl.remove(letter)
# print("word left" + str(wordl))
letters.remove(letter)
# print(letters)
count = count + 1
#print(count)
if count == 7:
print("Matched:" + word)
foundList = foundList + str.split(word)
foundList.sort()
result = ''
for items in foundList:
result = result + items + ','
print(result[:-1])
#Test cases
find_words("eabauea" "iveabdi")
#pattern = "asa" " qlocved"
#print("letters to look for: "+ pattern)
#find_words(pattern)
听起来更适合http://codereview.stackexchange.com/。 – alecxe
如果你也可以解释你的程序应该做什么,它会有所帮助。 – MYGz
有一件事......'wordlist = wordlist + str.split(line)'复制每行的单词列表。做'wordlist.extend(line.strip()。split())'。或者,如果你想摆脱重复和更快的单词查找,请将'wordlist'设置为'set',并执行'.update'。 – tdelaney