用PostgreSQL在多列中进行全文搜索

我刚开始使用postgreSQL进行模糊文本匹配。我有两列：job_title和company_name。用PostgreSQL在多列中进行全文搜索

典型的全文搜索将连接job_title和company_name，然后根据单个排名返回搜索文本结果。

然而，在两列处理文本匹配同样可以在我的情况的问题。例如，Search Engineer在Google Co.不应与Google Search等于Engineer Co.

我知道我可以为每列分配不同的权重。但是，我没有理由去衡量一个比其他更重要的理由。

如何将我的关键字与每个列分开匹配，并在每个关键字上返回一些“匹配分数”？

喜欢的东西：

Jobs.where("(to_tsvector('english', position) @@ plainto_tsquery(:q)) AND 

(to_tsvector('english',company) @@ plainto_tsquery(:q))", q: "Search Engineer", q: "Google")

来源

2013-06-28 AdamNYC

正如您所指出的那样，你可以连接tsvectors：

# select to_tsvector('job description') || 
     to_tsvector('company as keyword') || 
     to_tsvector('job description as body') as vector; 
          vector       
----------------------------------------------------------- 
'bodi':9 'compani':3 'descript':2,7 'job':1,6 'keyword':5 
(1 row)

而且你还可以给它们的权重：

# select (setweight(to_tsvector('job description'), 'A') || 
     setweight(to_tsvector('company as keyword'), 'B') || 
     setweight(to_tsvector('job description as body'), 'D')) as vector; 
          vector        
--------------------------------------------------------------- 
'bodi':9 'compani':3B 'descript':2A,7 'job':1A,6 'keyword':5B 
(1 row)

你可以也玩弄ts_rank_cd()。特别是，你可以改变分数标准化的方式。

http://www.postgresql.org/docs/current/static/textsearch-controls.html

在你的情况，似乎要两个单独的查询，而不是合并。一个丑陋的，但可能是适当的解决方案可能是这样的：

select sum(rank) as rank, ... 
from (
    select ... 
    union all 
    select ... 
    ) as sub 
group by ... 
order by sum(rank) desc 
limit 10

正如你所看到的，这是不是很漂亮。这也是汇集潜在的大量匹配行的林荫大道。 Imho，如果需要的话，你最好坚持使用内置的tsvector算术和调整权重。

来源

2013-06-28 11:12:41

谢谢丹尼斯的优秀答案！ – AdamNYC

用PostgreSQL在多列中进行全文搜索

回答

相关问题