2016-12-05 64 views
0

说我有两张桌子。 businessesreviews为企业。Mysql贝叶斯和按星级评分

businesses表:

+----+-------+ 
| id | title | 
+----+-------+ 

reviews表:

+----+-------------+---------+------+ 
| id | business_id | message | rate | 
+----+-------------+---------+------+ 

每个评论有一个rate(1到5星)
我想他们的评论率对企业进行排序,根据Bayesian Ranking条件至少有2条评论。

这里是我的查询:

SELECT b.id, 
(SELECT COUNT(r.rate) as rr FROM reviews r WHERE r.business_id = b.id) as rr, 
(SELECT 
     ((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + 
     (2 /(COUNT(r.rate) + 2)) 4) 
    FROM reviews r where r.business_id = b.id AND rr > 2 
) as score 
FROM businesses b 
order by score desc 
LIMIT 4 

这将输出我:

+------+----+------------+ 
| id | rr | score  | 
+------+----+------------+ 
| 992 | 14 | 4.31250000 | 
+------+----+------------+ 
| 237 | 3 | 4.2000000 | 
+------+----+------------+ 
| 19 | 5 | 4.0000000 | 
+------+----+------------+ 
| 1009 | 12 | 3.9285142 | 
+------+----+------------+ 

我有两个问题:

  1. 当你看到在((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) FROM reviews r where r.business_id = b.id AND rr > 2)一些功能正在运行更多比一次,如COUNTAVG。他们是否在后台运行一次,也许缓存resuslt?或运行每一个电话?

  2. 是否有任何等效查询,但更优化?

在此先感谢。

+0

你甚至能得到'正确'的答案吗?我认为'rr'不应该对第二个子查询可见。 –

回答

1

我希望MySQL能够优化多重计数,但不能确定。

但是,您可以重新安排您查询加入反对子查询。这样你不会为每一行执行2个子查询。

SELECT b.id, 
     sub0.rr, 
     sub0.score 
FROM businesses b 
INNER JOIN 
(
    SELECT r.business_id, 
      COUNT(r.rate) AS rr , 
      ((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) AS score 
    FROM reviews r 
    GROUP BY r.business_id 
    HAVING rr > 2 
) sub0 
ON sub0.business_id = b.id 
ORDER BY score DESC 
LIMIT 4 

注意,这里的结果是非常略有不同,因为它会排除只有2条评论记录,而您的查询仍然会返回,但他们的得分为NULL。我已经离开了明显缺少的运营商(即AVG(r.rate)之前和之前4)您的原始查询AS评分

使用上面的想法,您可以重新编码它以返回子查询中的计数和平均速率,并仅使用那些返回的列的值来计算。

SELECT b.id, 
     sub0.rr, 
     ((rr/(rr + 2)) arr + (2 /(rr + 2)) 4) AS score 
FROM businesses b 
INNER JOIN 
(
    SELECT r.business_id, 
      COUNT(r.rate) AS rr , 
      AVG(r.rate) AS arr 
    FROM reviews r 
    GROUP BY r.business_id 
    HAVING rr > 2 
) sub0 
ON sub0.business_id = b.id 
ORDER BY score DESC 
LIMIT 4 
+0

谢谢你的回复。我试图运行你的第二个查询,但它在第12行错误'b.id未知'在内部选择的地方。所以我改变了这一点。 https://codetidy.com/9750/,但子查询得到所有rr> 2 – Pars

+1

@Pars - 糟糕,固定。子查询将得到计数大于2的所有数据,但是该数据将与业务表相连接,然后执行计算以计算得分。因此,对子查询的连接不包括2个或更少的评论,而主查询中的ORDER/LIMIT将限制为4个返回的行。 – Kickstart