这样做没有直接的方法,但我重新考虑了Jared的建议,并能够得到解决方案的工作。
我记录在这里,以防万一别人有同样的问题。
创建一个实现org.apache.lucene.search.highlight.Formatter
public class HitPositionCollector implements Formatter
{
// MatchOffset is a simple DTO
private List<MatchOffset> matchList;
public HitPositionCollector(
{
matchList = new ArrayList<MatchOffset>();
}
// this ie where the term start and end offset as well as the actual term is captured
@Override
public String highlightTerm(String originalText, TokenGroup tokenGroup)
{
if (tokenGroup.getTotalScore() <= 0)
{
}
else
{
MatchOffset mo= new MatchOffset(tokenGroup.getToken(0).toString(), tokenGroup.getStartOffset(),tokenGroup.getEndOffset());
getMatchList().add(mo);
}
return originalText;
}
/**
* @return the matchList
*/
public List<MatchOffset> getMatchList()
{
return matchList;
}
}
主代码
public void testHitsWithHitPositionCollector() throws Exception
{
System.out.println(" .... testHitsWithHitPositionCollector");
String fuzzyStr = "bro*";
QueryParser parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
Query fzyQry = parser.parse(fuzzyStr);
TopDocs hits = searcher.search(fzyQry, 10);
QueryScorer scorer = new QueryScorer(fzyQry, "f");
HitPositionCollector myFormatter= new HitPositionCollector();
//Highlighter(Formatter formatter, Scorer fragmentScorer)
Highlighter highlighter = new Highlighter(myFormatter,scorer);
highlighter.setTextFragmenter(
new SimpleSpanFragmenter(scorer)
);
Analyzer analyzer2 = new SimpleAnalyzer();
int loopIndex=0;
//for (ScoreDoc sd : hits.scoreDocs) {
Document doc = searcher.doc(hits.scoreDocs[0].doc);
String title = doc.get("f");
TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[0].doc,
"f",
doc,
analyzer2);
String fragment = highlighter.getBestFragment(stream, title);
System.out.println(fragment);
assertEquals("the quick brown fox jumps over the lazy dog", fragment);
MatchOffset mo= myFormatter.getMatchList().get(loopIndex++);
assertTrue(mo.getEndPos()==15);
assertTrue(mo.getStartPos()==10);
assertTrue(mo.getToken().equals("brown"));
}
您是否在寻找东西沿着这些路线的一类? http://lucene.apache.org/java/3_0_0/api/contrib-highlighter/index.html – Jared 2011-05-03 19:05:57
不,我不希望高亮文本;我需要做进一步的文本处理。在做进一步的文本处理之前,我需要找出匹配的术语是“模糊”还是“模糊”等,因为这是模糊匹配。 – user193116 2011-05-03 19:22:39