2013-01-14 28 views
4

我正在尝试构建一种纸张文件。对于我有以下表格:Mysql计算来自不同行的团队的排名

CREATE TABLE `teams` (
    `id` int(11) NOT NULL AUTO_INCREMENT, 
    `creator_id` int(11) NOT NULL, 
    `friend_id` int(11) DEFAULT NULL, 
    `team_name` varchar(128) NOT NULL, 
    PRIMARY KEY (`id`) 
); 

team_log

CREATE TABLE IF NOT EXISTS `progress_tracker` (
    `id` int(8) NOT NULL AUTO_INCREMENT, 
    `user_id` int(8) NOT NULL, 
    `team_id` int(11) NOT NULL, 
    `date` date NOT NULL, 
    `clues_found` int(11) NOT NULL, 
    `clues_to_find` int(11) NOT NULL, 
    PRIMARY KEY (`id`) 
); 
  1. 每队由两个用户组成;
  2. 每个用户从可找到的可变数量的线索开始;
  3. clues_found可以增加或减少。不保证最高的数字是最新的;

我需要根据用户在加入后找到的线索的平均数(以百分比表示)(针对团队中的两个用户) - clues_found表示最大日期的行减去具有最低日期的记录上的clues_found)。

举例来说,如果我有每个表的以下数据:

队表数据

+--------+------------+------------+---------------+ 
|  id | creator_id | friend_id | team_name | 
+--------+------------+------------+---------------+ 
|  1 |   25 |   28 |   Test1 | 
|  2 |   31 |   5 |   Test2 | 
+--------+------------+------------+---------------+ 

team_log表数据

+--------+---------+---------+------------+-------------+---------------+ 
|  id | user_id | team_id | date | clues_found | clues_to_find | 
+--------+---------+---------+------------+-------------+---------------+ 
|  1 |  25 |  1 | 2013-01-6 |   3 |   24 | 
|  2 |  25 |  1 | 2013-01-8 |   7 |   24 | 
|  3 |  25 |  1 | 2013-01-10 |   10 |   24 | 
|  4 |  28 |  1 | 2013-01-8 |   5 |   30 | 
|  5 |  28 |  1 | 2013-01-14 |   20 |   30 | 
|  6 |  31 |  2 | 2013-01-11 |   6 |   14 | 
|  7 |  5 |  2 | 2013-01-9 |   2 |   20 | 
|  8 |  5 |  2 | 2013-01-10 |   10 |   20 | 
|  9 |  5 |  2 | 2013-01-12 |   14 |   20 | 
+--------+---------+---------+------------+-------------+---------------+ 

所需的结果

+-------------+---------------------+ 
|  team_id | team_percentage | 
+-------------+---------------------+ 
|   1 |   39,58333333 | 
|   2 |   30   | 
+-------------+---------------------+ 

作为参考,这是一个中间表示这可能有助于理解:

+-------------+---------+---------------------+ 
|  user_id | team_id | precentage_per_user | 
+-------------+---------+---------------------+ 
|   25 |  1 | 29,16666667   | 
|   28 |  1 | 50     | 
|   31 |  2 | 0     | 
|   5 |  2 | 60     | 
+-------------+---------+---------------------+ 

到目前为止,我有以下SQL:

SELECT STRAIGHT_JOIN 
     tl2.team_id, (tl2.weight - tl1.weight)*100/tl2.clues_to_find 
    from 
     (select 
       team_id,user_id,clues_found 
      FROM 
       `team_log` 
      where 1 

      group by 
       team_id, user_id 
      order by 
       `date`) base 
     join (select team_id, user_id, clues_found, clues_to_find from `team_log` where user_id = base.user_id and team_id = base.team_id group by team_id, user_id order by `date` desc) tl2 

但这返回为我的错误m不允许在第二个查询中使用base.user_id。我也不是很确定我正朝着正确的方向前进。

任何人都可以帮忙吗?

+1

请添加后“team_log表数据”称号断行,所以我们可以看到它正确 – Roy

+0

完成。感谢您的支持 – jribeiro

+2

您能否解释一下这句话:“我需要根据用户加入后发现的线索的平均数(以百分比表示)获得队伍的排名(对于团队中的两个用户) - clues_found在最大日期的行上减去最低日期的记录上的clues_found)。“当我将它应用到团队2时,我得到了-12 - 找到的总线索是32,平均值是8.然后我减去14和6. –

回答

1

请看一看这个和评论:

队PCT:

select z.team_id, avg(z.pct) as teampct 
from (
select x.user_id, y.team_id, x.mndate, 
y.mxdate, x.mnclues_found, 
y.mxclues_found, 
(((y.mxclues_found - x.mnclues_found)*100) 
/y.mxclues_tofind) pct 
from 
(select user_id, team_id, min(date) mndate, 
min(clues_found) as mnclues_found 
from team_log 
group by user_id, team_id) x 
left join 
(select user_id, team_id, max(date) mxdate, 
max(clues_found) as mxclues_found, 
max(clues_to_find) as mxclues_tofind 
from team_log 
group by user_id, team_id) y 
on x.user_id = y.user_id and 
x.team_id = y.team_id) z 
group by z.team_id 
; 

结果1:

| USER_ID | TEAM_ID | MNDATE | MXDATE | MNCLUES_FOUND | MXCLUES_FOUND |  PCT | 
------------------------------------------------------------------------------------- 
|  5 |  2 | 13-01-09 | 13-01-12 |    2 |   14 |  60 | 
|  25 |  1 | 13-01-06 | 13-01-10 |    3 |   10 | 29.1667 | 
|  28 |  1 | 13-01-08 | 13-01-14 |    5 |   20 |  50 | 
|  31 |  2 | 13-01-11 | 13-01-11 |    6 |    6 |  0 | 

结果˚F伊纳勒:

| TEAM_ID | TEAMPCT | 
---------------------- 
|  1 | 39.58335 | 
|  2 |  30 | 
+0

我认为不同的值是因为你使用了一个错误的公式。应该像(maxDateRow.clues_found - minDateRow.clues_found)* 100/clues_to_find。但是小平台值得这个世界。 ;) 十分感谢你的帮助。 – jribeiro

+1

@jribeiro我很高兴你有答案:)并且很高兴你使用了telf sqlfiddle。这是一个很好的问题,并明确表示+1。我更新了正确的公式。但是不能根据你的问题来匹配结果;)而sqlfiddle中显示的'explain plan'将帮助你确定最佳答案。 – bonCodigo

+1

@jribeiro你是绝对正确的,有一个公式需要改变。我没有注意到你正在用'最大线索来划分'以找到';)祝你好运! – bonCodigo

1

SQLFiddle

SELECT `team_id`, 
    (SUM(CASE WHEN b.`date` IS NULL THEN 0 ELSE `clues_found` * 100/`clues_to_find` END) - 
    SUM(CASE WHEN c.`date` IS NULL THEN 0 ELSE `clues_found` * 100/`clues_to_find` END))/2 
FROM `team_log` a 
    LEFT JOIN (
    SELECT `team_id`, `user_id`, MAX(date) AS `date` 
    FROM `team_log` 
    GROUP BY `team_id`, `user_id`) b 
    USING (`team_id`, `user_id`, `date`) 
    LEFT JOIN (
    SELECT `team_id`, `user_id`, MIN(date) AS `date` 
    FROM `team_log` 
    GROUP BY `team_id`, `user_id`) c 
    USING (`team_id`, `user_id`, `date`) 
    GROUP BY `team_id` 

既然你说总会有二队队员,我用/2。对于可变规模的团队来说,这会稍微复杂一点。

+0

感谢您的回答。事情是,clues_found可以增加或减少。不保证最高的号码是最新的号码。也不会SUM(clues_found)总结每一行的所有值?例如,不会为user_id = 25返回20(clues_found)? – jribeiro

+0

好的......我完全误解了这个问题......现在明白了......再给它一次 – ic3b3rg

+0

@jribeiro“clues_found可以增加或减少。”你如何找出一条线索? – 2013-01-14 20:16:37

1

这是一个有点难看,但应该工作:

select 
    team_id, 
    AVG(percentage_per_user) as team_percentage 
from (select 
    team_id, 
    user_id, 
    ((select clues_found from progress_tracker as x 
     where x.user_id = m.user_id order by x.date desc limit 0, 1) 
    - (select clues_found from progress_tracker as y 
     where y.user_id = m.user_id order by y.date asc limit 0, 1)) 
/MAX(clues_to_find) 
    as percentage_per_user 
from progress_tracker as m 
group by team_id, user_id 
) as userScore 
group by team_id 
order by team_percentage desc; 

注意自身运行内部查询将产生你中间的“每用户”的结果。

+0

谢谢@ebyrob,这有点难看,但结果是现货。真的赞成帮助! – jribeiro

2

这里的另一个查询,将产生正确的结果:

SELECT calc.team_id, AVG((calc.end_clues - calc.start_clues)/calc.total_clues*100) as team_percentage 
FROM 
    (SELECT log1.user_id, log1.team_id, log1.clues_found as start_clues, log2.clues_found as end_clues, log2.clues_to_find as total_clues FROM team_log log1 
    JOIN 
    (SELECT MIN(id) as start_id, MAX(id) as end_id FROM team_log GROUP BY user_id) ids 
    ON ids.start_id = log1.id 
    JOIN team_log log2 ON ids.end_id = log2.id) calc 
GROUP BY team_id 
ORDER BY team_id; 

And the SQL Fiddle-link...

+0

我会试试看。并明智地检查最佳答案。感谢您的帮助 – jribeiro

+1

我真的很喜欢这个查询的两件事。 1)使用MAX(id)将解决具有相同日期的两个条目的任何问题。 2)在一个查询中开始/结束的组确实简化了其余的连接。这也使百分比的等式很容易看到。 – 2013-01-14 21:20:33

+1

只要min(id)和max(id)可以正确引用开始日期和结束日期,那么@Marty yours是一种更简单的方法来获得结果...... :)正如你可以看到日期之间没有sqeuence,虽然我这个样本数据你很幸运。如果第14位是用户组1的第3位ID,那么该怎么办) – bonCodigo