4

我有一个MS SQL Server 2008数据库,用于存储供应食物的地方(咖啡厅,餐馆,食客等)。在连接到这个数据库的网站上,人们可以评分从1到3的比例。计算存储过程中的加权(贝叶斯)平均分数/指数?

在网站上有一个页面,人们可以查看排名前25的顶级名单(最好评级)某个城市。数据库结构看起来像这样(有表中存储更多的信息,但这里的相关信息): Database structure: Cities->Places->Votes

的地方坐落在一个城市和票放在一个地方。

到目前为止,我刚刚计算了每个地方的平均投票分数,我将某个地方的所有选票总数与该地点的投票数相除,如下所示(伪代码):

vote_count = total number of votes for the place 
vote_sum = total sum of all the votes for the place 

vote_score = vote_sum/vote_count 

如果一个地方没有投票,我还必须处理除以零。所有这些都是在存储过程中完成的,该存储过程获取我想要显示在顶部列表中的其他数据。这里是取前25位最高的投得分当前存储过程:

ALTER PROCEDURE [dbo].[GetTopListByCity] 
    (
    @city_id Int 
    ) 
AS 
    SELECT TOP 25 dbo.Places.place_id, 
      dbo.Places.city_id, 
      dbo.Places.place_name, 
      dbo.Places.place_alias, 
      dbo.Places.place_street_address, 
      dbo.Places.place_street_number, 
      dbo.Places.place_zip_code, 
      dbo.Cities.city_name, 
      dbo.Cities.city_alias, 
      dbo.Places.place_phone, 
      dbo.Places.place_lat, 
      dbo.Places.place_lng, 
      ISNULL(SUM(dbo.Votes.vote_score),0) AS vote_sum, 
      (SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id) AS vote_count, 
      COALESCE((CONVERT(FLOAT,SUM(dbo.Votes.vote_score))/(CONVERT(FLOAT,(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id)))),0) AS vote_score 

    FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
    LEFT OUTER JOIN dbo.Votes ON dbo.Places.place_id = dbo.Votes.place_id 
    WHERE dbo.Places.city_id = @city_id 
    AND dbo.Places.hidden = 0 
    GROUP BY dbo.Places.place_id, 
      dbo.Places.city_id, 
      dbo.Places.place_name, 
      dbo.Places.place_alias, 
      dbo.Places.place_street_address, 
      dbo.Places.place_street_number, 
      dbo.Places.place_zip_code, 
      dbo.Cities.city_name, 
      dbo.Cities.city_alias, 
      dbo.Places.place_phone, 
      dbo.Places.place_lat, 
      dbo.Places.place_lng 
    ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

    RETURN 

正如你可以看到它获取的不仅仅是投得分更多 - 我需要的地方去的数据,全市它位于等等。这工作正常,但有一个大问题:投票分数太简单了,因为它没有考虑到投票数。与简单的计算方法,它具有一票比分3将在列表中较有十四票比分3和比分2一票的地方结束了更高的地方:

3/1 = 3 
(14*3 + 1*2) = 44/15 = 2.933333333333 

要解决我一直在研究使用某种形式的加权平均/加权指数。我发现了一个看起来很有前途的真实贝叶斯估计的例子。它看起来像这样:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C 

where: 

R = average for the place (mean) = (Rating) 
v = number of votes for the place = (votes) 
m = minimum number of votes required to be listed in the Top 25 (unsure how many, but somewhere between 2-5 seems realistic) 
C = the mean vote across the whole database 

的问题开始当我试图实现这个加权评分在存储过程 - 它很快变得复杂和我纠缠到的存储过程做什么括号和适度宽松的轨道。

现在我需要一些帮助的两个问题:

这是用于计算加权指数为我的网站的适当方法?

在存储过程中实现时,此(或其他合适的计算方法)的外观如何?

回答

1

我看不出任何问题与您的计算。但我可以看到你多次做同样的事情。我的建议将帮助你在一个地方做聚合,然后选择很容易。

;WITH CTE 
(
    SELECT 
     SUM(dbo.Votes.vote_score) AS SumOfVoteScore, 
     COUNT(*) AS CountOfVotes, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 
) 
SELECT TOP 25 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumOfVoteScore,0) AS vote_sum, 
    CTE.CountOfVotes AS vote_count, 
    COALESCE((CONVERT(FLOAT,CTE.SumOfVoteScore)/ 
    (CONVERT(FLOAT,CTE.CountOfVotes))),0) AS vote_score 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id=CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng 
ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

CTE函数帮助我们重新使用计算。所以我们不必使用SUM(vote_score)SELECT COUNT(*) FROM Votes WHERE...倍数。那么当你选择计算时很容易遵循。

我希望这有助于

编辑

您不必在CTE定义表列。这个CTE (SumOfVoteScore, CountOfVotes, place_id) AS的效果和CTE AS一样好。如果您使用递归cte,则需要定义列。因为你是union与其他部分。

仅供参考herehere您就会找到CTE功能

0

由于联发一些信息!

我一直在寻找CTE的东西,但我只是不知道它是我在找的东西!学习新东西总是很好,我知道我会在其他项目中使用CTE。当我在存储过程中实现你的CTE,我得到这个代码:

ALTER PROCEDURE dbo.GetTopListByCityCTE 
    (
    @city_id Int 
    ) 
AS 

;WITH CTE (SumOfVoteScore, CountOfVotes, place_id) AS 
(
    SELECT 
     SUM(dbo.Votes.vote_score) AS SumOfVoteScore, 
     COUNT(*) AS CountOfVotes, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 

) 

SELECT TOP 25 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumOfVoteScore,0) AS vote_sum, 
    CTE.CountOfVotes AS vote_count, 
    COALESCE((CONVERT(FLOAT,CTE.SumOfVoteScore)/ 
    (CONVERT(FLOAT,CTE.CountOfVotes))),0) AS vote_score 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id = CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng, 
     CTE.SumOfVoteScore, 
     CTE.CountOfVotes 
ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

快速检查表明,它返回相同的结果前面的代码,但它更容易阅读和遵守,并希望更有效。

现在我将不得不做一些试验,用一个考虑票数的新票替换旧的(简单的)评级计算方法。

+0

这样做..高兴地帮助你。如果你对我的回答没问题,你可以考虑接受它? – Arion 2012-04-02 10:33:38

+0

而且如果你看到我的答案,我已经更新了它 – Arion 2012-04-02 10:44:06

+0

我只是想确保CTE帮助我解决原始问题(实现更复杂的分数索引),然后再将答案标记为解决方案。我正在研究新的存储过程... – tkahn 2012-04-02 10:47:42

0

好了 - 所以这里是我想出了存储过程:

ALTER PROCEDURE dbo.GetTopListByCityCTE 
    (
    @city_id Int 
    ) 
AS 

DECLARE @MinimumNumber float; 
DECLARE @TotalNumberOfVotes int; 
DECLARE @AverageRating float; 
DECLARE @AverageNumberOfVotes float; 

/* MINIMUM NUMBER */ 
SET @MinimumNumber = 1; 

/* TOTAL NUMBER OF VOTES -- ALL PLACES */ 
SET @TotalNumberOfVotes = (
    SELECT COUNT(*) FROM Votes 
); 

/* AVERAGE RATING -- ALL PLACES */ 
SET @AverageRating = (
    SELECT 
     CONVERT(FLOAT,(SUM(dbo.Votes.vote_score)))/CONVERT(FLOAT,COUNT(*)) AS AverageRating 
    FROM 
     Votes); 

/* AVERAGE NUMBER OF VOTES -- ALL PLACES */ 
/* CURRENTLY NOT USED IN INDEX - KEPT FOR REFERENCE */ 
SET @AverageNumberOfVotes = (
    SELECT AVG(CONVERT(FLOAT,NumberOfVotes)) FROM (SELECT COUNT(*) AS NumberOfVotes FROM Votes GROUP BY place_id) AS AverageNumberOfVotes 

); 
/* SUM OF ALL VOTE SCORES AND COUNT OF ALL VOTES -- INDIVIDUAL PLACES */ 
WITH CTE AS (
    SELECT 
     CONVERT(FLOAT, SUM(dbo.Votes.vote_score)) AS SumVotesForPlace, 
     CONVERT(FLOAT, COUNT(*)) AS CountVotesForPlace, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 
) 

SELECT 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumVotesForPlace,0) AS vote_sum, 
    ISNULL(CTE.CountVotesForPlace,0) AS vote_count, 
    COALESCE((CTE.SumVotesForPlace/ 
    CTE.CountVotesForPlace),0) AS vote_score, 
    ISNULL((CTE.CountVotesForPlace/(CTE.CountVotesForPlace + @MinimumNumber)) * (COALESCE((CTE.SumVotesForPlace/CTE.CountVotesForPlace),0)) + (@MinimumNumber/(CTE.CountVotesForPlace + @MinimumNumber)) * @AverageRating,0) AS WeightedIndex 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id = CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng, 
     CTE.SumVotesForPlace, 
     CTE.CountVotesForPlace 
ORDER BY WeightedIndex DESC, vote_count DESC, place_name ASC 

有一个叫未在计算中使用@AverageNumberOfVotes变量,但我的情况下,保持它有参考它可能需要。

根据我所得到的数据运行这个结果,我得到的结果与之前的结果稍有不同,但它不是革命性的,并不是我所需要的。下面是当我执行上面的SP所返回的前10行:

vote_sum  vote_count vote_score   WeightedIndex 
1110   409   2,71393643031785 2,7140960047496 
807    310   2,60322580645161 2,60449697749787 
38    15   2,53333333333333 2,56708633093525 
25    10   2,5     2,55442722744881 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 

的问题在这里似乎是,那里只有一票,比分是2,加权指数成为2,55188848920863?

计算该指数的计算公式是从IMDB(http://www.imdb.com/chart/top)拍摄的,我想,无论是我做错了什么,或者我有我的数据库中的数据不具有可比性的数据(投票数或投票规模)IMDB有?

编辑

有如此工作对我来说更好,我可以调整这个功能的方法吗?是否有不同的功能/方法可以更好地工作?我仍然需要在存储过程中进行计算。

+0

我不知道这个公式(即IMDB所谓的“真正的贝叶斯估计”)是我所需要的,而且有批评:http://en.wikipedia.org/wiki/Bayes_estimator#Practical_example_of_misapplication_of_Bayes_estimators – tkahn 2012-04-02 13:50:29