错误：在文件中匹配字

有两句话中的 “test_tweet1.txt”错误：在文件中匹配字

@francesco_con40 2nd worst QB. DEFINITELY Tony Romo. The man who likes to share the ball with everyone. Including the other team. 
@mariakaykay aga tayo tomorrow ah. :) Good night, Ces. Love you! >:D<

在 “Personal.txt”

The Game (rapper) 
The Notorious B.I.G. 
The Undertaker 
Thor 
Tiësto 
Timbaland 
T.I. 
Tom Cruise 
Tony Romo 
Trajan 
Triple H

我的代码：

import re 
popular_person = open('C:/Users/Personal.txt') 
rpopular_person = popular_person.read() 
file1 = open("C:/Users/test_tweet1.txt").readlines() 
array = [] 
count1 = 0 
for line in file1: 
    array.append(line) 
    count1 = count1 + 1 
    print "\n",count1, line 
    ltext1 = line.split(" ") 
    for i,text in enumerate(ltext1): 
     if text in rpopular_person: 
      print text 
    text2 = ' '.join(ltext1)

结果来自代码显示：

1 @francesco_con40 2nd worst QB. DEFINITELY Tony Romo. The man who likes to share the ball with everyone. Including the other team. 
Tony 
The 
man 
to 
the 
the 

2 @mariakaykay aga tayo tomorrow ah. :) Good night, Ces. Love you! >:D< 
aga

我试图将“test_tweet1.txt”和“Personal.txt”中的单词进行匹配。

预期结果：

Tony 
Romo

什么建议吗？

来源

2013-06-04 ThanaDaray

需要分割rpopular_person得到它匹配的话，而不是子

rpopular_person = open('C:/Users/Personal.txt').read().split()

这给：

Tony 
The

罗莫是显示不出来的原因是线路上的拆分你有“罗莫”。也许你应该在行中寻找rpopular_person，而不是其他方式。也许这样的事情

popular_person = open('C:/Users/Personal.txt').read().split("\n") 
file1 = open("C:/Users/test_tweet1.txt") 
array = [] 
for count1, line in enumerate(file1): 
    print "\n", count1, line 
    for person in popular_person: 
     if person in line: 
      print person

来源

2013-06-04 15:17:16 cmd

我试过了，并没有工作。结果变为空白。 – ThanaDaray

@ThanaDaray尝试添加的代码，看看是否给你想要的 – cmd

你的问题似乎是rpopular_person只是一个单一的字符串。因此，当您询问诸如'to' in rpopular_person之类的内容时，您会得到True的值，因为字符't', 'o'顺序出现。我假设在Personal.txt的其他地方'the'也是如此。

你想要做的是将Personal.txt分割成单独的单词，就像分割你的推文一样。您也可以将生成的单词列表转换为set，因为这会使查找速度更快。事情是这样的：

people = set(popular_person.read().split())

这也是值得注意的是，我打电话split()，不带参数。这会分割所有空白符 - 换行符，制表符等。这样，你就可以像你打算的那样单独获得一切。或者，如果你不其实想这（因为这会给你根据你的编辑Personal.txt的内容“中的”所有的时间效果），使其：

people = set(popular_person.read().split('\n'))

这样你拆分换行符，所以你只能寻找全名匹配。

你没有收到“Romo”，因为这不是你的推文中的一个词。你的推文中的单词是“Romo”。有一段时间。这很可能对你来说仍然是一个问题，所以我会做的是重新安排你的逻辑（假设速度不是问题）。而不是看你的推文中的每个单词，看看你的Personal.txt文件中的每个名字，看看它是否是你的完整推文in。这样你就不必处理标点符号等等。以下是我如何改写你的功能：

rpopular_person = set(personal.split()) with open("Personal.txt") as p: people = p.read().split('\n') # Get full names rather than partial names with open("test_tweet1.txt") as tweets: for tweet in tweets: for person in people: if person in tweet: print person

来源

2013-06-04 15:17:35

我的错。是的，正如你所说的“罗莫”。它删除特殊字符后显示，但“The”也显示。任何建议？ – ThanaDaray

@ThanaDaray“The”位于您的Personal.txt文件中，因此会匹配。如果你只想匹配全名，只需将换行符（''\ n''）分割成Personal.txt。 –

错误：在文件中匹配字

回答

相关问题