比较数据集并返回最佳匹配

在mysql中，我使用“连接表”将标签分配给项目。我希望看到哪些商品与正在查看的商品具有最相似的代码。比较数据集并返回最佳匹配

例如，假设感兴趣的项目已被标记为“酷”，“汽车”和“红色”。我想用这些标签搜索其他项目。我想查看已标记为“汽车”的商品，但我希望标记为“汽车”和“红色”的商品位于仅标记为“汽车”的商品之上。我希望具有相同标签的项目位于结果的顶部。

是否有某种方式来比较使用另一个数据集（子查询）数据集（子查询）？或者，我可以使用一些技巧来使用GROUP BY和GROUP_CONCAT（）将它们评估为逗号分隔的列表吗？

2009-09-02 seans

这将有助于如果你告诉我们，你的表结构，这样我就可以更具体。

我假设你已经有了类似这样的结构：

Table item: (id, itemname) 
1 item1 
2 item2 
3 item3 
4 item4 
5 item5 

Table tag: (id, tagname) 
1 cool 
2 red 
3 car 

Table itemtag: (id, itemid, tagid) 
1 1 2 (=item1, red) 
2 2 1 (=item2, cool) 
3 2 3 (=item2, car) 
4 3 1 (=item3, cool) 
5 3 2 (=item3, red) 
6 3 3 (=item3, car) 
7 4 3 (=item3, car) 
8 5 3 (=item3, car)

一般我的做法是通过计算每个单独的标签开始了。

-- make a list of how often a tag was used: 
select tagid, count(*) as `tagscore` from itemtag group by tagid

这显示了分配给该项目的每个标签的一行。

在我们的例子，这将是：

tag tagscore 
1 2   (cool, 2x) 
2 2   (red, 2x) 
3 4   (car, 4x) 


set @ItemOfInterest=2; 

select 
    itemname, 
    sum(tagscore) as `totaltagscore`, 
    GROUP_CONCAT(tags) as `tags` 
from 
    itemtag 
join item on itemtag.itemid=item.id 

join 
    /* join the query from above (scores per tag) */ 
    (select tagid, count(*) as `tagscore` from itemtag group by tagid) as `TagScores` 
    on `TagScores`.tagid=itemtag.tagid 
where 
    itemid<>@ItemOfInterest and 
    /* get the taglist of the current item */ 
    tagid in (select distinct tagid from itemtag where [email protected]) 
group by 
    itemid 
order by 
    2 desc

说明：查询有2子查询：一是从感兴趣的项目获得该列表的标签。我们只想和那些人一起工作。其他子查询会为每个标签生成一个分数列表。

所以最终，数据库中的每个项目都有标签分数列表。这些分数加起来为sum(tagscore)，这个数字用于排序结果（最高分）。

要显示可用标签的列表中，我使用GROUP_CONCAT。

查询将导致这样的事情（我做了实际的数据在这里）：

Item TagsScore Tags 
item3 15   red,cool,car 
item4 7   red,car 
item5 7   red 
item1 5   car 
item6 5   car

来源

2009-09-02 23:42:13

这两个回复都处于正确的轨道，并让我走向短期解决方案。就如何扩展这个例程而言，我仍在寻找！ – seans 2009-09-03 23:00:03

如何：

SELECT post, SUM(IF(tag IN ('cool', 'cars', 'red'), 1, 0)) AS number_matching 
FROM tags 
GROUP BY post 
ORDER BY number_matching DESC

这里的术语列表可以从您的应用程序，如果您有它已经得心应手，或可以从一个子查询生成填充到SQL。

来源

2009-09-02 23:00:49 VoteyDisciple

这将排序工作，但你必须动态地生成该查询，因为每个项目可以有一组不同的标签。硬编码列表可以替换为子查询来解决这个问题。 – 2009-09-02 23:44:05

这就是我的想法。编辑澄清。 – VoteyDisciple 2009-09-03 01:23:05

比较数据集并返回最佳匹配

回答

相关问题