Lucene.net HTML文档示例用一串html代替一个文件？

我正在做一个网络爬虫，我想在流媒体正在进行或完成时使用lucene进行索引。Lucene.net HTML文档示例用一串html代替一个文件？

我已经看到，lucene.net html库的例子是好的。但是，我不想保持下载到磁盘。我想要什么，只是在下载网页时索引，或者可能是html内容字符串的索引。

是否有任何例子，使lucence.net html索引器与内存流或字符串？

2011-09-09 Juidan Ho

这样的事情？

 // create writer to index 
     IndexWriter iw = new IndexWriter(new FileInfo("C:\\example\\"), new StandardAnalyzer()); 

     // create a document to index 
     Document d = new Document(); 

     // create a field that the document will contain 
     Field aField = new Field("test", "", Field.Store.YES, Field.Index.ANALYZED); 
     // add the field to the document 
     d.Add(aField); 

     // index some data (4 documents) 
     aField.SetValue("Example 1"); 
     iw.AddDocument(d); 
     aField.SetValue("Example 2"); 
     iw.AddDocument(d); 
     aField.SetValue("Example 3"); 
     iw.AddDocument(d); 

     aField.SetValue("Example 4"); 
     // a field with Store.NO can be set with a TextReader 
     Field notStored = new Field("test2", "", Field.Store.NO, Field.Index.ANALYZED); 
     notStored.SetValue(new StringReader("Example 4 - From TextReader")); 
     // add new field to a 4th document 
     d.Add(notStored); 
     iw.AddDocument(d); 

     // closing writer commits changes to disk 
     iw.Close();

来源

2011-09-13 14:41:49

Lucene.net HTML文档示例用一串html代替一个文件？

回答

相关问题