2012-09-10 43 views
2

我试图全文搜索与标签,但它并没有正常工作对我来说赤附加的图像,请enter image description hereMYsql FULLTEXT查询产生意想不到的排名;为什么?

查询是:

SELECT *, 
     MATCH(tags) AGAINST ('tag3 tag6 tag4') AS score 
    FROM items 
ORDER BY score DESC 

为何成绩不按照正确的顺序排序字段?如果你检查第二行有我搜索的所有标签,而第一个字段没有tag3关键字。

我的意思是标识字段顺序应该是:5,1,2 ...等和NOT 1,5,2..etc

哪里是我的错?

然后我想首先在标签字段中搜索,然后如果没有结果我想搜索与FULLTEXT内部描述字段相同的关键字,那么用户将在标签和描述中搜索标签和描述,如果标签不匹配,是否有可能在相同的查询或我需要两个分离的查询?

+1

如果您要查询的代码是以图片而不是文本形式显示的,那么对于回答者而言,您会更难。这意味着我们必须重新输入它 - 脖子上的痛苦。 –

回答

2

在本文档中http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html它说:“对于非常小的表格,单词分布没有充分反映它们的语义值,并且此模型有时可能会产生奇怪的结果。”

如果您的物品表很小 - 例如样品表 - 您可能会遇到这个问题并得到一个“奇怪”的结果。

您不妨试试这个查询IN BOOLEAN MODE,看看您的结果是否符合您的预测。尝试这个。

SELECT *, 
      MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) AS score 
     FROM items 
    ORDER BY score DESC 

布尔模式会禁用单词分配排名。注意,你应该理解自然语言和布尔模式之间的区别,一旦你有一个体面大小的表,明智地选择使用哪一个。如果您正在寻找博客所拥有的标签类型,布尔可能是一条可行的路。

+0

woooooooooow固定! – sbaaaang

+0

现在如果我需要在标签和描述字段中进行搜索,并优先考虑标签字段,该怎么办? :D – sbaaaang

+1

请阅读有关全文搜索的参考手册文档。这是相当复杂和复杂的搜索。如果您同时需要布尔标签式搜索和自然语言描述搜索,那么您可能必须将标签列标准化为一个新表,每行只有一个标签,并且只能使用FULLTEXT进行描述搜索。自然语言搜索的问题在于,它强调搜索排名中非常常见的词汇。所以如果你有一些非常普通的标签,它们可能不如你想要的那样有效搜索。 –

0

修改订单按评分DESC,编号DESC
假设得分的值相同,则带有的行将首先显示。

+0

那是对的,我无法理解的是1和5有相同的分数,导致5匹配所有关键字,而1匹配其中只有2个:/ – sbaaaang

+0

我认为通过id订购也不是一个好的修复 – sbaaaang

+0

由指定2列以排序,数据将按该顺序显示。所以,使用我的代码,**分数**将进行比较;如果它们是相同的,那么** id **将被比较;如果它们不同,则首先显示较高的数字。 –

1

首先,这里是您的示例数据加载到MySQL 5.5。12我的Windows7机器

mysql> DROP DATABASE IF EXISTS lspuk; 
Query OK, 1 row affected (0.00 sec) 

mysql> CREATE DATABASE lspuk; 
Query OK, 1 row affected (0.00 sec) 

mysql> USE lspuk 
Database changed 
mysql> CREATE TABLE items 
    -> (
    ->  id int not null auto_increment, 
    ->  description VARCHAR(30), 
    ->  tags VARCHAR(30), 
    ->  primary key (id), 
    ->  FULLTEXT tags_ftndx (tags) 
    ->) ENGINE=MyISAM; 
Query OK, 0 rows affected (0.04 sec) 

mysql> INSERT INTO items (description,tags) VALUES 
    -> ('the first' ,'tag1 tag3 tag4'), 
    -> ('the second','tag5 tag1 tag2'), 
    -> ('the third' ,'tag5 tag1 tag9'), 
    -> ('the fourth','tag5 tag6 tag2'), 
    -> ('the fifth' ,'tag4 tag3 tag6'), 
    -> ('the sixth' ,'tag2 tag3 tag6'); 
Query OK, 6 rows affected (0.00 sec) 
Records: 6 Duplicates: 0 Warnings: 0 

mysql> 

请看看标签人口在MySQL中发生的方式:

mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION 
    -> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION 
    -> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION 
    -> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION 
    -> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION 
    -> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION 
    -> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%'; 
+------+-----------+ 
| tag1 | tag_count | 
+------+-----------+ 
| tag1 |   3 | 
| tag2 |   3 | 
| tag3 |   3 | 
| tag4 |   2 | 
| tag5 |   3 | 
| tag6 |   3 | 
| tag9 |   1 | 
+------+-----------+ 
7 rows in set (0.00 sec) 

mysql> 

细心观察,请注意以下事实:

  1. 每一行都有正好3个标签
  2. 标签被请求的顺序与每个标签存在多少似乎控制得分

如果删除TAG4和运行查询,你会得到所有

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC; 
+----+-------------+----------------+-------+ 
| id | description | tags   | score | 
+----+-------------+----------------+-------+ 
| 1 | the first | tag1 tag3 tag4 |  0 | 
| 2 | the second | tag5 tag1 tag2 |  0 | 
| 3 | the third | tag5 tag1 tag9 |  0 | 
| 4 | the fourth | tag5 tag6 tag2 |  0 | 
| 5 | the fifth | tag4 tag3 tag6 |  0 | 
| 6 | the sixth | tag2 tag3 tag6 |  0 | 
+----+-------------+----------------+-------+ 
6 rows in set (0.00 sec) 

的评价方法,似乎没有得分是基于平均数令牌场和存在和/或不存在特定值的以特定的顺序影响评分。如果您可以将不同风格的得分和标签规范的,要注意各种得分:

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC; 
+----+-------------+----------------+--------------------+ 
| id | description | tags   | score    | 
+----+-------------+----------------+--------------------+ 
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 | 
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 | 
| 2 | the second | tag5 tag1 tag2 |     0 | 
| 3 | the third | tag5 tag1 tag9 |     0 | 
| 4 | the fourth | tag5 tag6 tag2 |     0 | 
| 6 | the sixth | tag2 tag3 tag6 |     0 | 
+----+-------------+----------------+--------------------+ 
6 rows in set (0.00 sec) 

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC; 
+----+-------------+----------------+-------+ 
| id | description | tags   | score | 
+----+-------------+----------------+-------+ 
| 5 | the fifth | tag4 tag3 tag6 |  3 | 
| 1 | the first | tag1 tag3 tag4 |  2 | 
| 6 | the sixth | tag2 tag3 tag6 |  2 | 
| 4 | the fourth | tag5 tag6 tag2 |  1 | 
| 2 | the second | tag5 tag1 tag2 |  0 | 
| 3 | the third | tag5 tag1 tag9 |  0 | 
+----+-------------+----------------+-------+ 
6 rows in set (0.00 sec) 

mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC; 
+----+-------------+----------------+-------+ 
| id | description | tags   | score | 
+----+-------------+----------------+-------+ 
| 5 | the fifth | tag4 tag3 tag6 |  1 | 
| 1 | the first | tag1 tag3 tag4 |  0 | 
| 2 | the second | tag5 tag1 tag2 |  0 | 
| 3 | the third | tag5 tag1 tag9 |  0 | 
| 4 | the fourth | tag5 tag6 tag2 |  0 | 
| 6 | the sixth | tag2 tag3 tag6 |  0 | 
+----+-------------+----------------+-------+ 
6 rows in set (0.00 sec) 

mysql> 

的解决方案似乎是评价一个布尔MODE得分,然后一个非布尔模式得分如下:

SELECT *, 
MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1, 
MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2 
FROM items ORDER BY score2 DESC, score1 DESC; 

这是对你的样本数据的结果:

mysql> SELECT *, 
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1, 
    -> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2 
    -> FROM items ORDER BY score2 DESC, score1 DESC; 
+----+-------------+----------------+--------------------+--------+ 
| id | description | tags   | score1    | score2 | 
+----+-------------+----------------+--------------------+--------+ 
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 |  1 | 
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 |  0 | 
| 2 | the second | tag5 tag1 tag2 |     0 |  0 | 
| 3 | the third | tag5 tag1 tag9 |     0 |  0 | 
| 4 | the fourth | tag5 tag6 tag2 |     0 |  0 | 
| 6 | the sixth | tag2 tag3 tag6 |     0 |  0 | 
+----+-------------+----------------+--------------------+--------+ 
6 rows in set (0.00 sec) 

mysql> 

或者你可以尝试不使用加号

mysql> SELECT *, 
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1, 
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2 
    -> FROM items ORDER BY score2 DESC, score1 DESC; 
+----+-------------+----------------+--------------------+--------+ 
| id | description | tags   | score1    | score2 | 
+----+-------------+----------------+--------------------+--------+ 
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 |  3 | 
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 |  2 | 
| 6 | the sixth | tag2 tag3 tag6 |     0 |  2 | 
| 4 | the fourth | tag5 tag6 tag2 |     0 |  1 | 
| 2 | the second | tag5 tag1 tag2 |     0 |  0 | 
| 3 | the third | tag5 tag1 tag9 |     0 |  0 | 
+----+-------------+----------------+--------------------+--------+ 
6 rows in set (0.00 sec) 

mysql> 

无论采用哪种方式,您都必须同时包含BOOLEAN MODE和非BOOLEAN模式。

相关问题