2013-06-23 44 views
0

好的,所以我有一个很好的小查询来返回得分结果。查询当前LIKE为基础,我想将其转换为全文查询,作为everyonekeeps告诉我。如果分数不相同,我想得到相同的结果顺序。我已经能够得到任何接近的唯一方法是通过展开我的交叉连接...将MySQL LIKE查询转换为全文查询

  • 我希望能够设置分数特定单词组合
  • 我希望能够设置基于该术语的发现位置的权重
  • 我不想根据搜索中的单词Power Set进行搜索。这就是说,如果用户输入“铁路员工”,我不想在任何时候搜索“员工”。我试图从查询中只搜索连续的术语分组。

如何使我的原始查询基于全文而仍然保持相对较小和组织?

您可以在SQLFiddle上查看这两个查询。

原始查询 - 尼斯和小,得分和搜索字词都在一次放置

SELECT 
    sum(score * multiplier) score, 
    a.id, 
    a.title 
FROM 
(
    SELECT 3 score, 'a railway employee' term UNION ALL 
    SELECT 2 score, 'railway employee' term UNION ALL 
    SELECT 2 score, 'a railway' term UNION ALL 
    SELECT 1 score, 'employee' term UNION ALL 
    SELECT 1 score, 'railway' term UNION ALL 
    SELECT 0 score, 'a' term 
) terms 
CROSS JOIN 
(
    SELECT 'T' TYPE, 1 multiplier 
    UNION ALL SELECT 'S', 1.1 
    UNION ALL SELECT 'C', 1.5 
) x 
INNER JOIN 
(
    SELECT id, 'T' TYPE, title SEARCH FROM articles 
    UNION ALL 
    SELECT id, 'S' TYPE, summary SEARCH FROM articles WHERE summary <> '' 
    UNION ALL 
    SELECT artId, 'C' TYPE, content SEARCH FROM articleSections 
) s ON s.TYPE = x.TYPE AND SEARCH LIKE concat('%', terms.term, '%') 
INNER JOIN articles a ON a.id = s.id 
WHERE score > 0 
GROUP BY id, title 
ORDER BY score DESC, title; 
; 

全文 - 凌乱,大,得分和搜索字词是所有的地方

SELECT 
    sum(score * multiplier) score, 
    id, 
    title 
FROM 
(
SELECT 
    3 score, 
    1 multiplier, 
    'T' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(title) AGAINST ('"a railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1 multiplier, 
    'T' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(title) AGAINST ('"railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1 multiplier, 
    'T' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(title) AGAINST ('"a railway"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1 multiplier, 
    'T' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(title) AGAINST ('railway' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1 multiplier, 
    'T' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(title) AGAINST ('employee' IN BOOLEAN MODE) 
UNION ALL 


SELECT 
    3 score, 
    1 multiplier, 
    'S' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(summary) AGAINST ('"a railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1.1 multiplier, 
    'S' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(summary) AGAINST ('"railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1.1 multiplier, 
    'S' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(summary) AGAINST ('"a railway"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1.1 multiplier, 
    'S' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(summary) AGAINST ('railway' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1.1 multiplier, 
    'S' AS loc, 
    id, 
    title 
FROM articles 
WHERE MATCH(summary) AGAINST ('employee' IN BOOLEAN MODE) 
UNION ALL 


SELECT 
    3 score, 
    1.5 multiplier, 
    'C' AS loc, 
    id, 
    title 
FROM articleSections 
INNER JOIN articles a ON a.id = artId 
WHERE MATCH(content) AGAINST ('"a railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1.5 multiplier, 
    'C' AS loc, 
    id, 
    title 
FROM articleSections 
INNER JOIN articles a ON a.id = artId 
WHERE MATCH(content) AGAINST ('"railway employee"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    2 score, 
    1.5 multiplier, 
    'C' AS loc, 
    id, 
    title 
FROM articleSections 
INNER JOIN articles a ON a.id = artId 
WHERE MATCH(content) AGAINST ('"a railway"' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1.5 multiplier, 
    'C' AS loc, 
    id, 
    title 
FROM articleSections 
INNER JOIN articles a ON a.id = artId 
WHERE MATCH(content) AGAINST ('railway' IN BOOLEAN MODE) 
UNION ALL 
SELECT 
    1 score, 
    1.5 multiplier, 
    'C' AS loc, 
    id, 
    title 
FROM articleSections 
INNER JOIN articles a ON a.id = artId 
WHERE MATCH(content) AGAINST ('employee' IN BOOLEAN MODE) 

) t 
WHERE score > 0 
GROUP BY id, title 
ORDER BY score DESC, title; 
; 
+0

你有一套错误的要求。您列出的这些“要求”是人为的,限制了您可以执行的各种解决方案。要求应该限制解决方案,而不是指定它们。请重新考虑您想要从搜索和编辑中获得什么。 –

+0

@LieRyan - 我希望能够确定标题在结果中的表现方式和原因...做到这一点,我想确定如何得分和得分是什么......如果我不关心什么结果我回来了,我只是想在一个简单的选择结束时做一个WHERE MATCH,然后完成它。 – Justin808

+2

@ Justin808。 。 。鉴于您的得分需求,您可能不想使用全文搜索。或者,您可能希望使用全文搜索来查找包含关键字的行,然后使用'like'和'join'来累计分数。 –

回答

0

这是太长的评论。

显然,你有非常具体的评分需求,既不符合搜索的自然语言模式也不符合布尔模式的搜索。我想知道在MySQL中是否有一些隐藏的机制会给你一个搜索的关键字匹配列表,然后你可以用它来进行评分。我不知道。

如果你有一个大的语料库和比较少见的词(意味着你正在寻找的词是在相对较少的文档中),那么你可以使用布尔模式来减少搜索空间。这样的查询看起来像这样:

select t.id, sum(terms.score * wherefactor.factor) 
from (select t.* 
     . . . 
     where MATCH(title, summary, content) AGAINST ('railway employee' IN BOOLEAN MODE) 
    ) t left outer join 
    (SELECT 3 score, 'a railway employee' term UNION ALL 
     SELECT 2 score, 'railway employee' term UNION ALL 
     SELECT 2 score, 'a railway' term UNION ALL 
     SELECT 1 score, 'employee' term UNION ALL 
     SELECT 1 score, 'railway' term UNION ALL 
     SELECT 0 score, 'a' term 
    ) terms cross join 
    (SELECT 'T' as which, 1.0 as factor UNION ALL 
    SELECT 'S', 1.1 UNION ALL 
    SELECT 'C', 1.5 
    ) wherefactor 
    on (case when wherefacctor.which = 'T' then title 
      when wherefactor.which = 'S' then subject 
      when wherefactor.which = 'C' then content 
     end) like concat('%', term, '%') 
group by t.id; 

这应该会给你全文搜索的性能以及你的评分算法的细节。

如果你有一个已知的词典,另一种可能性是建立一个文档项表。对于每个文档以及您关心的文档中的每个术语(这称为“词典”),这样的表格都会有一行。有了这样的数据结构,您可以自由地实现您选择的任何得分机制。