2013-06-28 164 views
3

我想使用django ORM的聚合功能在MSSQL 2008R2数据库上运行查询,但我不断收到超时错误。下面是失败的查询(由django生成)。我试过运行它指挥SQL管理工作室,它的工作原理,但需要3.5分钟超时运行SQL查询

它看起来是聚合在一堆领域,它不需要,但我不会有真的会导致它需要很长时间。数据库也不是那么大,auth_user有9条记录,ticket_ticket有1210和ticket_watchers有1876.有什么我失踪?

SELECT 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined], 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count], 
    COUNT(T3.[id]) AS [assigned_tickets__count], 
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] 
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined] 
HAVING 
    (COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0) 

编辑:

下面是相关的索引(不包括在查询中不使用):

auth_user.id      (PK) 
auth_user.username     (Unique) 
tickets_ticket.id     (PK) 
tickets_ticket.capturer_id 
tickets_ticket.responsible_id 
tickets_ticket_watchers.id   (PK) 
tickets_ticket_watchers.user_id 
tickets_ticket_watchers.ticket_id 

编辑2:

位的后实验中,我发现以下查询是导致执行速度慢的最小问题:

SELECT 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count], 
    COUNT(T3.[id]) AS [assigned_tickets__count], 
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] 
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id] 

奇怪的是,如果我在上面注释掉任意两条线,它在更短的运行是1秒,但它似乎并没有问题我删除哪些行(虽然很明显,我无法删除没有也删除相关SELECT行的连接)。

编辑3:

其产生这种情况的Python代码是:

User.objects.annotate(
    Count('tickets_captured'), 
    Count('assigned_tickets'), 
    Count('tickets_watched') 
) 

一看执行计划显示,SQL服务器第一次做的所有表中的交叉联接,产生约2.8亿行,以及6Gb的数据。我认为这是问题所在,但为什么会发生?

+0

你有什么样的索引在桌子上? –

+0

@NenadZivkovic无论django加入了什么 - 但好点,我会检查它们。 – aquavitae

+0

你说得对,它需要很长时间。我会得到执行计划来看看它,并为io和时间设置统计信息,以查看是什么阻止了它。你还可以发布导致这个查询的代码吗?查询对我来说没有多大意义。 –

回答

1

SQL Server正在做它被要求做的事情。不幸的是,Django没有为你想要的生成正确的查询。它看起来像你需要统计不同,而不是仅仅数:Django annotate() multiple times causes wrong answers

至于为什么查询这样工作:查询说,加入四个表在一起。因此,如果作者有2张拍摄的票据,3张指定票券和4张观看票券,则加入将返回2 * 3 * 4票据,每张票据组合一张。不同的部分将删除所有重复项。

0

这是怎么回事?

SELECT auth_user.*, 
    C1.tickets_captured__count 
    C2.assigned_tickets__count 
    C3.tickets_watched__count 

FROM 
auth_user 
LEFT JOIN 
(SELECT capturer_id, COUNT(*) AS tickets_captured__count 
    FROM tickets_ticket GROUP BY capturer_id) AS C1 ON auth_user.id = C1.capturer_id 
LEFT JOIN 
(SELECT responsible_id, COUNT(*) AS assigned_tickets__count 
    FROM tickets_ticket GROUP BY responsible_id) AS C2 ON auth_user.id = C2.responsible_id 
LEFT JOIN 
(SELECT user_id, COUNT(*) AS tickets_watched__count 
    FROM tickets_ticket_watchers GROUP BY user_id) AS C3 ON auth_user.id = C3.user_id 

WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0 
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)