2012-10-05 43 views
3

我有一个UUID字段,我以以下格式添加到我的文档中:372d325c-e01b-432f-98bd-bc4c949f15b8。但是,当我尝试通过UUID查询文档时,无论如何尝试转义表达式,它都不会返回它们。例如:在lucene中搜索UUID不起作用

+uuid:372d325c-e01b-432f-98bd-bc4c949f15b8 
+uuid:"372d325c-e01b-432f-98bd-bc4c949f15b8" 
+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8 
+uuid:(372d325c-e01b-432f-98bd-bc4c949f15b8) 
+uuid:("372d325c-e01b-432f-98bd-bc4c949f15b8") 

而且连QueryParser的完全使用TermQuery像这样跳绳:

new TermQuery(new Term("uuid", uuid.toString())) 

或者

new TermQuery(new Term("uuid", QueryParser.escape(uuid.toString()))) 

没有这些搜索将返回一个文件,但如果我搜索它将返回一个文档的UUID部分。例如,这些将返回的东西:

+uuid:372d325c 
+uuid:e01b 
+uuid:432f 

我应该怎么做索引这些文件,所以我可以将他们拉回来的UUID?我考虑重新格式化UUID以删除连字符,但我还没有实现它。

+0

你是否检查该字段是如何获取索引的?是否有可能uuid被lucene标记器拉开? – jtahlborn

+0

现在这里是我如何将UUID添加到索引:doc.add(新字段(“uuid”,id.toString(),Field.Store.YES,Field.Index.NOT_ANALYZED))。我对另一个项目使用完全相同的方案,它的工作原理很好,但不同之处在于其他项目中的ID不是UUID,也不包含连字符。 – chubbsondubs

+0

如果该字段未被分析(因此未被标记),那么以下不应查询应该返回空+ uuid:372d325c。 一般规则是确保您使用相同的分析器进行索引和搜索。 您是否确认使用Field.Index.NOT_ANALYZED进行索引,然后使用新的TermQuery(新术语(“uuid”,uuid.toString())) 进行搜索时会返回空? –

回答

1

我得到这个工作的唯一方法是使用WhitespaceAnalyzer而不是StandardAnalyzer。然后使用TermQuery像这样:

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new WhitespaceAnalyzer(Version.LUCENE_36)) 
      .setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); 
writer = new IndexWriter(directory, config); 

然后搜索:

TopDocs docs = searcher.search(new TermQuery(new Term("uuid", uuid.toString())), 1); 

WhitespaceAnalyzer防止Lucene的从由连字符裂开的UUID。另一种选择是消除UUID中的破折号,但使用WhitespaceAnalyzer也可以达到我的目的。

0

按照Lucene Query Syntax rules,查询

+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8 

应该工作。

我猜如果它不这样做,那是因为uuid字段没有填充,因为它应该当文档插入索引。你能确定这个字段究竟插入了什么吗?您可以使用Luke来抓取索引并查找为uuid字段存储的实际值。

+0

我已在Luke中确认该字段中的值已保存并存在于文档中。这一点进一步得到证实,我可以通过搜索原始问题中提到的uuid的一个片段来撤回文档。 – chubbsondubs

0

如果您计划将UUID字段作为查找键,则需要让Lucene将整个字段索引为单个字符串,而不进行标记化。这是通过为您的UUID字段设置正确的FieldType来完成的。在Lucene 4+中,你可以使用StringField。

import java.io.IOException; 
import java.util.UUID; 
import junit.framework.Assert; 
import org.apache.lucene.analysis.Analyzer; 
import org.apache.lucene.analysis.standard.StandardAnalyzer; 
import org.apache.lucene.document.Document; 
import org.apache.lucene.document.Field; 
import org.apache.lucene.document.StringField; 
import org.apache.lucene.document.TextField; 
import org.apache.lucene.index.DirectoryReader; 
import org.apache.lucene.index.IndexWriter; 
import org.apache.lucene.index.IndexWriterConfig; 
import org.apache.lucene.index.Term; 
import org.apache.lucene.queryparser.classic.ParseException; 
import org.apache.lucene.queryparser.classic.QueryParser; 
import org.apache.lucene.search.IndexSearcher; 
import org.apache.lucene.search.Query; 
import org.apache.lucene.search.TopDocs; 
import org.apache.lucene.store.Directory; 
import org.apache.lucene.store.RAMDirectory; 
import org.apache.lucene.util.Version; 
import org.junit.Test; 

/** 
* Using Lucene 4.7 on Java 7. 
*/ 
public class LuceneUUIDFieldLookupTest { 

    private Directory directory; 
    private Analyzer analyzer; 

    @Test 
    public void testUsingUUIDAsLookupKey() throws IOException, ParseException { 

     directory = new RAMDirectory(); 
     analyzer = new StandardAnalyzer(Version.LUCENE_47); 

     UUID docUUID = UUID.randomUUID(); 
     String docContentText1 = "Stack Overflow is a question and answer site for professional and enthusiast programmers."; 

     index(docUUID, docContentText1); 

     QueryParser parser = new QueryParser(Version.LUCENE_47, MyIndexedFields.DOC_TEXT_FIELD.name(), analyzer); 
     Query queryForProgrammer = parser.parse("programmers"); 

     IndexSearcher indexSearcher = getIndexSearcher(); 
     TopDocs hits = indexSearcher.search(queryForProgrammer, Integer.MAX_VALUE); 
     Assert.assertTrue(hits.scoreDocs.length == 1); 

     Integer internalDocId1 = hits.scoreDocs[0].doc; 
     Document docRetrieved1 = indexSearcher.doc(internalDocId1); 
     indexSearcher.getIndexReader().close(); 

     String docText1 = docRetrieved1.get(MyIndexedFields.DOC_TEXT_FIELD.name()); 
     Assert.assertEquals(docText1, docContentText1); 

     String docContentText2 = "TechCrunch is a leading technology media property, dedicated to ... according to a new report from the Wall Street Journal confirmed by Google to TechCrunch."; 
     reindex(docUUID, docContentText2); 

     Query queryForTechCrunch = parser.parse("technology"); 
     indexSearcher = getIndexSearcher(); //you must reopen directory because the previous IndexSearcher only sees a snapshoted directory. 
     hits = indexSearcher.search(queryForTechCrunch, Integer.MAX_VALUE); 
     Assert.assertTrue(hits.scoreDocs.length == 1); 

     Integer internalDocId2 = hits.scoreDocs[0].doc; 
     Document docRetrieved2 = indexSearcher.doc(internalDocId2); 
     indexSearcher.getIndexReader().close(); 

     String docText2 = docRetrieved2.get(MyIndexedFields.DOC_TEXT_FIELD.name()); 
     Assert.assertEquals(docText2, docContentText2); 
    } 

    private void reindex(UUID myUUID, String docContentText) throws IOException { 
     try (IndexWriter indexWriter = new IndexWriter(directory, getIndexWriterConfig())) { 
      Term term = new Term(MyIndexedFields.MY_UUID_FIELD.name(), myUUID.toString()); 
      indexWriter.updateDocument(term, buildDoc(myUUID, docContentText)); 
     }//auto-close 
    } 

    private void index(UUID myUUID, String docContentText) throws IOException { 
     try (IndexWriter indexWriter = new IndexWriter(directory, getIndexWriterConfig())) { 
      indexWriter.addDocument(buildDoc(myUUID, docContentText)); 
     }//auto-close 
    } 

    private IndexWriterConfig getIndexWriterConfig() { 
     return new IndexWriterConfig(Version.LUCENE_47, analyzer).setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); 
    } 

    private Document buildDoc(UUID myUUID, String docContentText) { 
     Document doc = new Document(); 
     doc.add(new Field(
       MyIndexedFields.MY_UUID_FIELD.name(), 
       myUUID.toString(), 
       StringField.TYPE_STORED));//use TYPE_STORED if you want to read it back in search result. 

     doc.add(new Field(
       MyIndexedFields.DOC_TEXT_FIELD.name(), 
       docContentText, 
       TextField.TYPE_STORED)); 

     return doc; 
    } 

    private IndexSearcher getIndexSearcher() throws IOException { 
     DirectoryReader ireader = DirectoryReader.open(directory); 
     IndexSearcher indexSearcher = new IndexSearcher(ireader); 
     return indexSearcher; 
    } 

    enum MyIndexedFields { 

     MY_UUID_FIELD, 
     DOC_TEXT_FIELD 
    } 
}