2013-03-29 60 views
0

有人可以给我提示,在lucene中应用伪反馈。我无法在Google上找到太多帮助。我正在使用相似性类。 在lucene中是否有任何类可以扩展来实现反馈? 谢谢。在lucene中实现反馈

+1

你能详细一点吗?你不愿意做什么? – javanna

+1

是的,请定义“伪反馈” – phani

+0

通过使用反馈我想扩大我的查询。我需要一些方法来扩展我的查询(反正应该没问题) – j10

回答

1

假设你指的是this relevance feedback method,一旦你有原始查询的TopDocs,迭代你想要的记录的数量(假设我们想要原始查询的前25个文档的前25项) ,并致电IndexReader.getTermVectors(int),这将抓住你需要的信息。遍历每个。而在哈希映射中存储术语频率将是我立即发生的实现。

喜欢的东西:

//Get the original results 
TopDocs docs = indexsearcher.search(query,25); 
HashMap<String,ScorePair> map = new HashMap<String,ScorePair>(); 
for (int i = 0; i < docs.scoreDocs.length; i++) { 
    //Iterate fields for each result 
    FieldsEnum fields = indexreader.getTermVectors(docs.scoreDocs[i].doc).iterator(); 
    String fieldname; 
    while (fieldname = fields.next()) { 
     //For each field, iterate it's terms 
     TermsEnum terms = fields.terms().iterator(); 
     while (terms.next()) { 
      //and store it 
      putTermInMap(fieldname, terms.term(), terms.docFreq(), map); 
     } 
    } 
} 

List<ScorePair> byScore = new ArrayList<ScorePair>(map.values()); 
Collections.sort(byScore); 

BooleanQuery bq = new BooleanQuery(); 
//Perhaps we want to give the original query a bit of a boost 
query.setBoost(5); 
bq.add(query,BooleanClause.Occur.SHOULD); 
for (int i = 0; i < 25; i++) { 
    //Add all our found terms to the final query 
    ScorePair pair = byScore.get(i); 
    bq.add(new TermQuery(new Term(pair.field,pair.term)),BooleanClause.Occur.SHOULD); 
} 
} 

//Say, we want to score based on tf/idf 
void putTermInMap(String field, String term, int freq, Map<String,ScorePair> map) { 
    String key = field + ":" + term; 
    if (map.containsKey(key)) 
     map.get(key).increment(); 
    else 
     map.put(key,new ScorePair(freq,field,term)); 
} 

private class ScorePair implements Comparable{ 
    int count = 0; 
    double idf; 
    String field; 
    String term; 

    ScorePair(int docfreq, String field, String term) { 
     count++; 
     //Standard Lucene idf calculation. This is calculated once per field:term 
     idf = (1 + Math.log(indexreader.numDocs()/((double)docfreq + 1)))^2; 
     this.field = field; 
     this.term = term; 
    } 

    void increment() { count++; } 

    double score() { 
     return Math.sqrt(count) * idf; 
    } 

    //Standard Lucene TF/IDF calculation, if I'm not mistaken about it. 
    int compareTo(ScorePair pair) { 
     if (this.score() < pair.score()) return -1; 
     else return 1; 
    } 
} 

(我并没有说这是功能代码,在它的当前状态)

+0

谢谢你会尝试一下。 – j10