在目录中搜索包含一个或多个单词的文件

我想要一个程序 - 搜索（文件，列表）来搜索USB棒D以查找包含一个或多个单词的文本，如果它包含字，它会把它放在一个列表中，然后转到下一个单词。对于每个文档，它都会找到单词，我希望它在“此文件目录”中显示“单词[0]，单词[1]，单词[2]”的语句。以下是我迄今尝试的内容：在目录中搜索包含一个或多个单词的文件

import os 

def search(file, list): 
    if list == []: 
     return 
    else: 
     if os.path.isfile(file): 
      try: 
       infile = open(file, 'r') 
       doc = infile.read() 
      except: 
       return 
      infile.close() 
      print ('Searching {}'.format(file)) 
      if list[0] in doc: 
       print('{} in {}'.format(list[0], file)) 
     elif os.path.isdir(file): 
      for item in os.listdir(file): 
       itempath = os.path.join(file, item) 
       search(itempath, list) 
    return search(file, list[1:])

来源

2017-06-01 calculator2compiler

对于初学者，您忘记了在递归调用中返回'return search（itempath，list）' – karthikr

谢谢，我现在已经在列表中运行了，但是我忘记了提示中的额外步骤，现在更新了问题 – calculator2compiler

如果你想逐个查看单词，只是迭代列表而不是返回'return search（file，list [1：]）'是否合理？ –

你不是遍历您list（顺便说一句，不要使用file和list作为变量名，你的阴影内置类型）来检查的条件，你必须这样做：

found_words = [] 
for word in list: 
    if word in doc: 
     found_words.append(word) 
if found_words: 
    print('{} in {}'.format(", ".join(found_words), file))

而是如果你想检查所有条款。但是，你要做到这一点比它需要的复杂得多。对于初学者，您应该使用os.walk()递归地浏览所有子目录。其次，在内存中读取整个文件不是一个好主意 - 不仅平均而言搜索速度会更慢，而且当您遇到大文件时，您可能会开始出现内存问题...

我会做它是这样的：

def search(path, terms): 
    result = {} # store our result in the form "file_path": [found terms] 
    start_path = os.path.abspath(os.path.realpath(path)) # full path, resolving a symlink 
    for root, dirs, files in os.walk(start_path): # recurse our selected dir 
     for source in files: # loop through each files 
      source_path = os.path.join(root, source) # full path to our file 
      try: 
       with open(source_path, "r") as f: # open our current file 
        found_terms = [] # store for our potentially found terms 
        for line in f: # loop through it line-by line 
         for term in terms: # go through all our terms and check for a match 
          if term in line: # if the current term exists on the line 
           found_terms.append(term) # add the found term to our store 
        if found_terms: # if we found any of the terms... 
         result[source_path] = found_terms # store it in our result 
      except IOError: 
       pass # ignore I/O errors, we may optionally store list of failed files... 
    return result

它会返回一个字典，其键被设置为您的文件路径，值是发现的术语。因此，例如，如果你搜索在当前文件夹中的文件的字（运行脚本文件夹）“进口”，你可以用做：

search_results = search("./", ["import, export"]) 
for key in search_results: 
    print("{} in {}".format(", ".join(search_results[key]), key)

，它应该打印你想要的结果。它也可以使用检查文件扩展名/类型，所以你不会浪费你的时间试图通过一个不可读/二进制文件。此外，编解码器的检查应该是为了依赖于你的文件，读取它的行可能会引起unicode错误（解码默认）。底线，有很大的改进空间...

此外，请注意，你并不是正在寻找一个字，但仅仅是传递的字符序列的存在。例如，如果您要搜索cat，它也会返回包含caterpillar的文件。而且，还有一些专用工具可以在短时间内完成。

来源

2017-06-01 02:39:09 zwer

在目录中搜索包含一个或多个单词的文件

回答

相关问题