在lucene中搜索UUID不起作用

我有一个UUID字段，我以以下格式添加到我的文档中：372d325c-e01b-432f-98bd-bc4c949f15b8。但是，当我尝试通过UUID查询文档时，无论如何尝试转义表达式，它都不会返回它们。例如：在lucene中搜索UUID不起作用

+uuid:372d325c-e01b-432f-98bd-bc4c949f15b8 
+uuid:"372d325c-e01b-432f-98bd-bc4c949f15b8" 
+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8 
+uuid:(372d325c-e01b-432f-98bd-bc4c949f15b8) 
+uuid:("372d325c-e01b-432f-98bd-bc4c949f15b8")

而且连QueryParser的完全使用TermQuery像这样跳绳：

new TermQuery(new Term("uuid", uuid.toString()))

或者

new TermQuery(new Term("uuid", QueryParser.escape(uuid.toString())))

没有这些搜索将返回一个文件，但如果我搜索它将返回一个文档的UUID部分。例如，这些将返回的东西：

+uuid:372d325c 
+uuid:e01b 
+uuid:432f

我应该怎么做索引这些文件，所以我可以将他们拉回来的UUID？我考虑重新格式化UUID以删除连字符，但我还没有实现它。

来源

2012-10-05 chubbsondubs

你是否检查该字段是如何获取索引的？是否有可能uuid被lucene标记器拉开？ – jtahlborn

现在这里是我如何将UUID添加到索引：doc.add（新字段（“uuid”，id.toString（），Field.Store.YES，Field.Index.NOT_ANALYZED））。我对另一个项目使用完全相同的方案，它的工作原理很好，但不同之处在于其他项目中的ID不是UUID，也不包含连字符。 – chubbsondubs

如果该字段未被分析（因此未被标记），那么以下不应查询应该返回空+ uuid：372d325c。一般规则是确保您使用相同的分析器进行索引和搜索。您是否确认使用Field.Index.NOT_ANALYZED进行索引，然后使用新的TermQuery（新术语（“uuid”，uuid.toString（）））进行搜索时会返回空？ –

我得到这个工作的唯一方法是使用WhitespaceAnalyzer而不是StandardAnalyzer。然后使用TermQuery像这样：

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new WhitespaceAnalyzer(Version.LUCENE_36)) 
      .setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); 
writer = new IndexWriter(directory, config);

然后搜索：

TopDocs docs = searcher.search(new TermQuery(new Term("uuid", uuid.toString())), 1);

WhitespaceAnalyzer防止Lucene的从由连字符裂开的UUID。另一种选择是消除UUID中的破折号，但使用WhitespaceAnalyzer也可以达到我的目的。

来源

2012-10-19 04:45:01 chubbsondubs

按照Lucene Query Syntax rules，查询

+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8

应该工作。

我猜如果它不这样做，那是因为uuid字段没有填充，因为它应该当文档插入索引。你能确定这个字段究竟插入了什么吗？您可以使用Luke来抓取索引并查找为uuid字段存储的实际值。

来源

2012-10-05 22:23:48

我已在Luke中确认该字段中的值已保存并存在于文档中。这一点进一步得到证实，我可以通过搜索原始问题中提到的uuid的一个片段来撤回文档。 – chubbsondubs

如果您计划将UUID字段作为查找键，则需要让Lucene将整个字段索引为单个字符串，而不进行标记化。这是通过为您的UUID字段设置正确的FieldType来完成的。在Lucene 4+中，你可以使用StringField。

import java.io.IOException; 
import java.util.UUID; 
import junit.framework.Assert; 
import org.apache.lucene.analysis.Analyzer; 
import org.apache.lucene.analysis.standard.StandardAnalyzer; 
import org.apache.lucene.document.Document; 
import org.apache.lucene.document.Field; 
import org.apache.lucene.document.StringField; 
import org.apache.lucene.document.TextField; 
import org.apache.lucene.index.DirectoryReader; 
import org.apache.lucene.index.IndexWriter; 
import org.apache.lucene.index.IndexWriterConfig; 
import org.apache.lucene.index.Term; 
import org.apache.lucene.queryparser.classic.ParseException; 
import org.apache.lucene.queryparser.classic.QueryParser; 
import org.apache.lucene.search.IndexSearcher; 
import org.apache.lucene.search.Query; 
import org.apache.lucene.search.TopDocs; 
import org.apache.lucene.store.Directory; 
import org.apache.lucene.store.RAMDirectory; 
import org.apache.lucene.util.Version; 
import org.junit.Test; 

/** 
* Using Lucene 4.7 on Java 7. 
*/ 
public class LuceneUUIDFieldLookupTest { 

    private Directory directory; 
    private Analyzer analyzer; 

    @Test 
    public void testUsingUUIDAsLookupKey() throws IOException, ParseException { 

     directory = new RAMDirectory(); 
     analyzer = new StandardAnalyzer(Version.LUCENE_47); 

     UUID docUUID = UUID.randomUUID(); 
     String docContentText1 = "Stack Overflow is a question and answer site for professional and enthusiast programmers."; 

     index(docUUID, docContentText1); 

     QueryParser parser = new QueryParser(Version.LUCENE_47, MyIndexedFields.DOC_TEXT_FIELD.name(), analyzer); 
     Query queryForProgrammer = parser.parse("programmers"); 

     IndexSearcher indexSearcher = getIndexSearcher(); 
     TopDocs hits = indexSearcher.search(queryForProgrammer, Integer.MAX_VALUE); 
     Assert.assertTrue(hits.scoreDocs.length == 1); 

     Integer internalDocId1 = hits.scoreDocs[0].doc; 
     Document docRetrieved1 = indexSearcher.doc(internalDocId1); 
     indexSearcher.getIndexReader().close(); 

     String docText1 = docRetrieved1.get(MyIndexedFields.DOC_TEXT_FIELD.name()); 
     Assert.assertEquals(docText1, docContentText1); 

     String docContentText2 = "TechCrunch is a leading technology media property, dedicated to ... according to a new report from the Wall Street Journal confirmed by Google to TechCrunch."; 
     reindex(docUUID, docContentText2); 

     Query queryForTechCrunch = parser.parse("technology"); 
     indexSearcher = getIndexSearcher(); //you must reopen directory because the previous IndexSearcher only sees a snapshoted directory. 
     hits = indexSearcher.search(queryForTechCrunch, Integer.MAX_VALUE); 
     Assert.assertTrue(hits.scoreDocs.length == 1); 

     Integer internalDocId2 = hits.scoreDocs[0].doc; 
     Document docRetrieved2 = indexSearcher.doc(internalDocId2); 
     indexSearcher.getIndexReader().close(); 

     String docText2 = docRetrieved2.get(MyIndexedFields.DOC_TEXT_FIELD.name()); 
     Assert.assertEquals(docText2, docContentText2); 
    } 

    private void reindex(UUID myUUID, String docContentText) throws IOException { 
     try (IndexWriter indexWriter = new IndexWriter(directory, getIndexWriterConfig())) { 
      Term term = new Term(MyIndexedFields.MY_UUID_FIELD.name(), myUUID.toString()); 
      indexWriter.updateDocument(term, buildDoc(myUUID, docContentText)); 
     }//auto-close 
    } 

    private void index(UUID myUUID, String docContentText) throws IOException { 
     try (IndexWriter indexWriter = new IndexWriter(directory, getIndexWriterConfig())) { 
      indexWriter.addDocument(buildDoc(myUUID, docContentText)); 
     }//auto-close 
    } 

    private IndexWriterConfig getIndexWriterConfig() { 
     return new IndexWriterConfig(Version.LUCENE_47, analyzer).setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); 
    } 

    private Document buildDoc(UUID myUUID, String docContentText) { 
     Document doc = new Document(); 
     doc.add(new Field(
       MyIndexedFields.MY_UUID_FIELD.name(), 
       myUUID.toString(), 
       StringField.TYPE_STORED));//use TYPE_STORED if you want to read it back in search result. 

     doc.add(new Field(
       MyIndexedFields.DOC_TEXT_FIELD.name(), 
       docContentText, 
       TextField.TYPE_STORED)); 

     return doc; 
    } 

    private IndexSearcher getIndexSearcher() throws IOException { 
     DirectoryReader ireader = DirectoryReader.open(directory); 
     IndexSearcher indexSearcher = new IndexSearcher(ireader); 
     return indexSearcher; 
    } 

    enum MyIndexedFields { 

     MY_UUID_FIELD, 
     DOC_TEXT_FIELD 
    } 
}

来源

2014-03-25 00:58:46

在lucene中搜索UUID不起作用

回答

相关问题