2015-06-02 143 views
3

我有1个名为错误它具有以下结构:优化SQL查询连接

错误

| id | UserID  | CrashDump | ErrorCode| Timestamp 
| 1 | user1   | Crash 1  | 100  | 2015-04-08 21:00:00 
| 2 | user2   | Crash 2  | 102  | 2015-04-10 22:00:00 
| 3 | user3   | Crash 4  | 105  | 2015-05-08 12:00:00 
| 4 | user4   | Crash 4  | 105  | 2015-06-02 21:22:00 
| 5 | user4   | Crash 4  | 105  | 2015-06-03 04:16:00 

我希望得到如下数据结果集:

预期结果集

CrashDump  | Error Count| Affected Users| 
    Crash 4   | 3   | 2    | 
    Crash 2   | 1   | 1    | 
    Crash 1   | 1   | 1    | 

结果集会将每个错误的计数保存为错误计数和受影响的用户(接收到此错误的不同用户)。

我已经能够使用下面的查询获得期望的结果,但它已被证明是非常资源密集型的,并且在巨大的数据集MySQL崩溃。 您能否引导我如何优化我目前的查询或指导我实现其逻辑的更好方法?任何帮助将不胜感激。

当前查询:

select B.CrashDump as CrashDump, B.B_UID as affected users, C.C_UID as ErrorCount 
from 
(
    Select count(A.UserID) as B_UID, A.CrashDump, (A.timestamp) as timestmp, 
    (a.errorcode) as errorCde, (a.ID) as uniqueId 
    from 
    ( 
     select UserID , CrashDump, timestamp,errorcode,id 
     from errors 
     where Timestamp >='2015-04-08 21:00:00' and Timestamp <='2015-06-10 08:18:15' 
     group by userID,CrashDump 
    ) as A 
    group by A.CrashDump 
) as B 

left outer join 
(
    select CrashDump , count(UserID) as C_UID 
    from errors 
    where Timestamp >='2015-04-08 21:00:00' and Timestamp <='2015-06-10 08:18:15' 
    group by CrashDump 
) as C 

On B.CrashDump = C.CrashDump 

order by ErrorCount desc limit 0,10 
+0

你的问题是使用'GROUP BY'和['GROUP BY'聚合函数]解决的经典问题(http://dev.mysql.com/doc/refman/5.7/en/group-by-functions的.html)。这[回答](http://stackoverflow.com/a/30591063/4265352)显示你的解决方案。 – axiac

回答

1

这是工作的解决方案:

Select A.CrashDump, sum(A.ErrorCount) as ErrorC, count(A.AffectedUsers) 
From 
(
SELECT 
    CrashDump, 
    COUNT(ErrorCode) AS ErrorCount, 
    COUNT(DISTINCT UserID) AS AffectedUsers, UserID 
FROM 
    errors 
WHERE 
    Timestamp >='2015-05-13 10:00:00' and Timestamp <='2015-05-14 03:07:00' 

GROUP BY 
    CrashDump, userID 
) AS A 
group by A.CrashDump 

order by ErrorC desc limit 0,10 

谢谢大家帮助实现期望的结果。

2

你就不能做到这一点?:

SELECT 
    CrashDump, 
    COUNT(ErrorCode) AS ErrorCount, 
    COUNT(DISTINCT UserID) AS AffectedUsers 
FROM 
    Errors 
WHERE 
    Timestamp >='2015-04-08 21:00:00' and Timestamp <='2015-06-10 08:18:15' 
GROUP BY 
    CrashDump 
+0

我已经尝试了您在提供之前提供的查询解决方案。这个实现面临的问题是,查询会带来不一致的数据。结合查询后,请检查我的最终解决方案。 – Mubarak

3

尝试

SELECT CrashDump, COUNT(ErrorCode) AS ErrorCount, COUNT(DISTINCT UserID) AS AffectedUser 
FROM errors 
WHERE Timestamp >='2015-04-08 21:00:00' AND Timestamp <='2015-06-10 08:18:15' 
GROUP BY CrashDump 
+0

执行COUNT(DISTINCT UserID),即使用户有多次崩溃转储崩溃,也只能对用户计数一次。 – jarlh

+0

是的,你是正确的 – tning

1
SELECT CrashDump, SUM(e) AS "Error Count", MAX(u) AS "Affected Users" 
FROM(
SELECT crashdump, count(errorcode) as e, count(userid) as u 
FROM errors 
WHERE Time_stamp BETWEEN '2015-04-08 21:00:00' and '2015-06-10 08:18:15' 
GROUP BY crashdump, userid) a 
GROUP BY crashdump 
ORDER BY crashdump DESC 

输出

crashdump Error Count Affected Users 
Crash 4  3   2 
Crash 2  1   1 
Crash 1  1   1 

SQL FIDDLE:http://sqlfiddle.com/#!9/13eab/1/0

+0

谢谢你,我用你的查询推导出最终的解决方案。干杯! – Mubarak