2011-01-06 131 views
4

我正在为我的网站建立一个全文搜索工具,用asp.net mvc和mysql数据库编码。本网站适用于非英语语言。我已经开始使用Lucense作为搜索文本的引擎,但我无法找到关于它是否支持unicode的任何信息?Lucene支持Unicode吗?

有没有人有任何关于Lucene是否支持Unicode的信息?我不想要一个令人讨厌的惊喜..

有关实现lucene.net的初学者文章的链接将不胜感激。

+1

我非常行动http://www.amazon.com/Lucene-Action-Otis-Gospodnetic/dp/1932394281建议Lucene的 – 2011-01-06 07:08:11

回答

8

是的。它完全支持unicode。
但是为了分析,您应明确指定适当的词干和正确的词组。 至于样本。下面是我们的最后一个项目

directory = new RAMDirectory(); 
      analyzer = new StandardAnalyzer(version, new Hashtable()); 
      var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); 
      using (var session = sessionFactory.OpenStatelessSession()) 
      { 
       organizations = session.CreateCriteria(typeof(Organization)).List<Organization>(); 
       foreach (var organization in organizations) 
       { 
        var document = new Document(); 
        document.Add(new Field("Id", organization.ID.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); 
        document.Add(new Field("FullName", organization.FullName, Field.Store.NO, Field.Index.ANALYZED_NO_NORMS)); 
        document.Add(new Field("ObjectTypeInvariantName", typeof(Organization).FullName, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); 
        indexWriter.AddDocument(document); 
       } 

       var persistentType = typeof(Order); 
       var classMetadata = DbContext.SessionFactory.GetClassMetadata(persistentType); 


       var properties = new List<PropertyInfo>(); 
       for (int i = 0; i < classMetadata.PropertyTypes.Length; i++) 
       { 
        var propertyType = classMetadata.PropertyTypes[i]; 
        if (propertyType.IsCollectionType || propertyType.IsEntityType) continue; 
        properties.Add(typeof(Order).GetProperty(classMetadata.PropertyNames[i])); 
       } 

       orders = session.CreateCriteria(typeof(Order)).List<Order>(); 
       var idProperty = typeof(Order).GetProperty(classMetadata.IdentifierPropertyName); 

       foreach (var order in orders) 
       { 
        var document = new Document(); 
        document.Add(new Field("Id", idProperty.GetValue(order, null).ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); 
        document.Add(new Field("ObjectTypeInvariantName", typeof(Order).FullName, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); 
        foreach (var property in properties) 
        { 
         var value = property.GetValue(order, null); 
         if (value != null) 
         { 

          document.Add(new Field(property.Name, value.ToString(), Field.Store.NO, Field.Index.ANALYZED_NO_NORMS)); 
         } 
        } 
        indexWriter.AddDocument(document); 
       } 
       indexWriter.Optimize(true); 
       indexWriter.Commit(); 
       return indexWriter.GetReader(); 
      } 

我从NHibernate的查询组织对象复制并付诸Lucene.NET

下面是简单的搜索

var searchValue = textEdit1.Text; 

       var parser = new QueryParser(version, "FullName", analyzer); 
       parser.SetLocale(new CultureInfo("ru-RU")); 
       Query query = parser.Parse(searchValue); 
       var indexSearcher = new IndexSearcher(directory, true); 

       var docs = indexSearcher.Search(query, 10); 
       lblSearchTotal.Text = string.Format(totalPattern, docs.totalHits, organizations.Count() + orders.Count); 
       resultPanel.Controls.Clear(); 
       foreach (var found in docs.scoreDocs) 
       { 
        var document = indexSearcher.Doc(found.doc); 
        var objectId = document.Get("Id"); 
        var objectType = document.Get("ObjectTypeInvariantName"); 

        if (resultPanel.Controls.Count > 0) 
        { 
         var labelSeparator = CreateSeparatorLabelControl(); 
         resultPanel.Controls.Add(labelSeparator); 
        } 
        var labelCard = CreateFoundLabelControl(); 
        resultPanel.Controls.Add(labelCard); 

        var organization = organizations.Where(o => o.ID.ToString() == objectId).FirstOrDefault(); 
        if (organization != null) 
        { 
         labelCard.Text = string.Format("<b>{0}</b></br>{1}", organization.AccountNumber, organization.FullName); 
         labelCard.Tag = organization; 
         //labels[count].Text = string.Format("<b>{0}</b></br>{1}", organization.AccountNumber, organization.FullName); 
         //labels[count].Visible = true; 
        } 
        else 
        { 
         labelCard.Text = string.Format("Найден объект типа '{0}' с идентификатором '{1}'", objectType, objectId); 
         labelCard.Tag = mainForm.GetObject(objectType, objectId); 
        } 
        labelCard.Visible = true; 
        //count++; 
       } 
+0

也NHibernate的搜索可以使用。 – 2011-01-06 07:10:21

2

Lucene的不支持Unicode,但有限制。例如,某些文档阅读器不支持unicode。另外,lucene可以处理复数或非复数的单词。当你使用外语时,一些消失了。

5

是的,Lucene支持unicode,因为它存储UTF-8格式的字符串。

http://lucene.apache.org/java/3_0_3/fileformats.html

个字符

Lucene的写入Unicode字符序列UTF-8编码的字节。

字符串

Lucene的写入字符串作为UTF-8编码的字节。首先,以字节为单位的长度写为VInt,后跟字节。

字符串 - > Vint的,字符数