在大文字文件中搜索的最佳方法

我在网上搜索了大约5000个文件的文件，我需要搜索关于任何关键字的所有文件例如：“人力资源”。在大文字文件中搜索的最佳方法

因此，我创建函数读取Word文件，但我的问题我想处理任务将杀死服务器的内存
示例代码：

<?php 
function doc_to_text($input_file){ //for doc files 
    $file_handle = @fopen($input_file, "r"); //open the file 
    $stream_text = @fread($file_handle, filesize($input_file)); 
    $stream_line = explode(chr(0x0D),$stream_text); 
    $output_text = ""; 
    foreach($stream_line as $single_line){ 
     $line_pos = strpos($single_line, chr(0x00)); 
     if(($line_pos !== FALSE) || (strlen($single_line)==0)){ 
      $output_text .= ""; 
     }else{ 
      $output_text .= $single_line." "; 
     } 
    } 
    $output_text = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\[email protected]\/\_\(\)]/", "", $output_text); 
    return $output_text; 
} 


function docx_to_text($input_file){ //for docx files 
    $xml_filename = "word/document.xml"; //content file name 
    $zip_handle = new ZipArchive; 
    $output_text = ""; 
    if(true === $zip_handle->open($input_file)){ 
     if(($xml_index = $zip_handle->locateName($xml_filename)) !== false){ 
      $xml_datas = $zip_handle->getFromIndex($xml_index); 
      $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING); 
      $output_text = strip_tags($xml_handle->saveXML()); 
     }else{ 
      $output_text .=""; 
     } 
     $zip_handle->close(); 
    }else{ 
    $output_text .=""; 
    } 
    return $output_text; 
} 





?>

然后，我将创建循环，并检查每个文件由stristr关键词（）函数，如果stristr（）返回true，那么脚本将打印文件名。

我们有另一种解决方案吗？

参考： stristr()

来源

2014-02-24 Ahmad Samilo

是的，你可以过早地建立一个搜索索引并搜索它。 – zerkms

你尝试过使用awk，sed吗？ – ziollek

你需要创建一个名为inverse index结构，其中每个字映射（或者可能是，如果你想连词组的文档）。 Wiki页面很好地记录了过程，它非常简单。

比您可以将此结构存储在数据库中（这将在预处理步骤中仅执行一次），稍后可能会在添加新的Doc或Docx文件时进行更改。

当用户插入他的单词时，您不在文件中搜索，而是在数据库中搜索，这将会很快并且会利用索引。

来源

2014-02-24 20:29:25

在大文字文件中搜索的最佳方法

回答

相关问题