识别出的重复单词的打印行号

我已经创建了一个能正确识别重复单词的程序，但是我的操作方式不允许识别重复来自的行。我确实创建了一行行列表（行列表），然后从这些行中取出所有单词并将它们放入自己的列表中。我一直在寻找一种方法来显示重复来自哪一行。识别出的重复单词的打印行号

通过程序运行的文本可以在下面找到，然后是程序本身。忽略每个引号之后的空行，因为它没有出现在输入文本文件中。另外，作为参考，“XXX”标记是我希望显示行号的位置。

他会做他自己的自由的自由安全，

必须从防守的压迫，甚至他的敌人;

对于他是否违反了这一义务，他

，他建立了一个先例，将达到自己。

- 托马斯·潘恩

import math 
file = open(str(input("Enter file name: ")), "r") 

linelist = [] 

file_cont = file.readlines() 
for lines in file_cont: 
    linelist.append(lines) 

wordlist = [] 
# function that splits file into lines, then into words 

def split_words(string): 
    lines = string 
    for line in lines: 
     for word in line.split(): 
      yield word 

# loop to add each word from prior function into a single list 

for word in split_words(file_cont): 
    wordlist.append(word) 

# variables declared 
x = 0 
y = 1 
z = len(wordlist) 

# loop that prints the first and following word next to each other 
while z > x: 
    #print(wordlist[x], wordlist[y]) 

    if wordlist[x] == wordlist[y]: 
     print("Found word: ",'"',wordlist[x],'"'," on line {}.".format(XXX), sep="") 

    x += 1 
    y += 1 

    if y == z: 
     break

任何帮助是极大的赞赏。谢谢！

来源

2015-04-06 Jack Wright

是'He'一样'he'？ –

不可以，不可以。不可以，不可以。不可以，不可以。不可以。不可以。不可以。不可以。 –

我会建议创建一个字典，其中键是单词的索引和值是当前行索引。

您可以从linelist生成它。

来源

2015-04-06 20:43:07

这是非常简单的用枚举：

with open('data.txt') as data: 
    lines = [i.split() for i in data] 

for i, j in enumerate(lines): 
    if any(j[h] == j[h + 1] for h, k in enumerate(j[:-1])): 
     print i + 1 # add one because counting starts 0

来源

2015-04-06 21:08:06

这只是检查一个单词是否出现两次 - 我认为目标是检查双重单词（“和和”，但不是“然后和”）。 – TigerhawkT3

也许，情况可能如此。让我看看我能如何解决我的答案。 –

@ TigerhawkT3修正了你所说的问题。 –

而不是在口头上的一个长长的清单寻找重复，保持它在嵌套list。

# why import math? 

with open(input("Enter file name: "), "r") as f: # input() already returns a str 
    linelist = [line.split() for line in f.readlines()] # don't need to duplicate this with file_cont 

for l in range(len(linelist)-1): # -1 to avoid index out of range 
    for w in range(len(linelist[l])-1): # -1 to avoid index out of range 
     if linelist[l][w] == linelist[l][w+1]: 
      print("Found word: ",'"',linelist[l][w],'"'," on line {}.".format(l+1), sep="") 

    if linelist[l][-1] == linelist[l+1][0]: # check repetition between lines 
     print("Found word: ",'"',linelist[l][-1],'"'," on line {}.".format(l+2), sep="") 

for w in range(len(linelist[-1])-1): # check last line 
    if linelist[-1][w] == linelist[-1][w+1]: 
      print("Found word: ",'"',linelist[-1][w],'"'," on line {}.".format(len(linelist)), sep="")

文件（额外guard加入表明，只有连续重复检查）：

He that would make his own liberty liberty secure, 
must guard even his enemy from guard oppression; 
for for if he violates this duty, he 
he establishes a precedent that will reach to himself. 
-- Thomas Paine

结果：

Found word: "liberty" on line 1. 
Found word: "for" on line 3. 
Found word: "he" on line 4.

来源

2015-04-06 21:23:28 TigerhawkT3

识别出的重复单词的打印行号

回答

相关问题