1

我正在创建一个显示饼图的Web应用程序。为了获得一个HTTP请求从一个的PostgreSQL 9.3数据库图表中的所有数据,我结合多个SELECT语句与UNION ALL - 这里有一个部分:比多个SELECT语句更好的方法吗?

SELECT 'spf' as type, COUNT(*) 
    FROM (SELECT cai.id 
      FROM common_activityinstance cai 
      JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id 
      JOIN common_activitysetting cas ON cas.id = cais.id 
      JOIN quizzes_quiz q ON q.id = cai.activity_id 
      WHERE cai.end_time::date = '2015-09-12' 
      AND q.name != 'Exit Ticket Quiz' 
      AND cai.activity_type = 'QZ' 
      AND (cas.key = 'disable_student_nav' AND cas.value = 'True' 
      OR cas.key = 'pacing' AND cas.value = 'student') 
      GROUP BY cai.id 
      HAVING COUNT(cai.id) = 2) sub 
UNION ALL 
SELECT 'spn' as type, COUNT(*) 
    FROM common_activityinstance cai 
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id 
    JOIN common_activitysetting cas ON cas.id = cais.id 
    WHERE cai.end_time::date = '2015-09-12' 
    AND cai.activity_type = 'QZ' 
    AND cas.key = 'disable_student_nav' 
    AND cas.value = 'False' 
UNION ALL 
SELECT 'tp' as type, COUNT(*) 
    FROM (SELECT cai.id 
      FROM common_activityinstance cai 
      JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id 
      JOIN common_activitysetting cas ON cas.id = cais.id 
      WHERE cai.end_time::date = '2015-09-12' 
      AND cai.activity_type = 'QZ' 
      AND cas.key = 'pacing' AND cas.value = 'teacher') sub; 

这将产生一个不错的,小响应回发给客户端:

type | count 
------+--------- 
spf | 100153 
spn | 96402 
tp | 84211 

我不知道我的查询是否可以提高效率。每个SELECT语句大多使用相同的JOIN操作。有没有办法不重复每个新的SELECT的JOIN?
而我实际上更喜欢单列3列。或者,一般来说,是否有一些完全不同但比我更好的方法呢?

+0

公用表表达式可能是您的方法。 – Ginden

+1

如果您没有提供像您这样的表定义,您应该*必须*在查询中至少对表格进行限定*所有*列,否则我们无法知道列的来源。你需要结果作为3行还是它可以是三列的单行? –

+0

谢谢。他们现在合格('cas.key'和'cas.value')。 –

回答

2

您可以在CTE的单个主查询中捆绑大部分成本,并多次重复使用结果。
此方法返回三列每个typeas requested in the comment)命名的单行:

WITH cte AS (
    SELECT cai.id, cai.activity_id, cas.key, cas.value 
    FROM common_activityinstance cai 
    JOIN common_activityinstance_settings s ON s.activityinstance_id = cai.id 
    JOIN common_activitysetting cas ON cas.id = s.id 
    WHERE cai.end_time::date = '2015-09-12' -- problem? 
    AND cai.activity_type = 'QZ' 
    AND (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR 
      cas.key = 'pacing' AND cas.value IN ('student', 'teacher')) 
    ) 
SELECT * 
FROM (
    SELECT count(*) AS spf 
    FROM (
     SELECT c.id 
     FROM cte c 
     JOIN quizzes_quiz q ON q.id = c.activity_id 
     WHERE q.name <> 'Exit Ticket Quiz' 
     AND (c.key, c.value) IN (('disable_student_nav', 'True') 
           , ('pacing', 'student')) 
     GROUP BY 1 
     HAVING count(*) = 2 
    ) sub 
    ) spf 
, (
    SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn 
     , count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp 
    FROM cte 
    ) spn_tp; 

应的Postgres 9.3工作。在Postgres 9。4您可以使用新的聚合FILTER条款:

count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn 
, count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp 

详情两种语法变体:

条件标志着problem?可能是大的性能问题,这取决于数据类型cai.end_time。首先,它不是sargable。如果是timestamptz类型,则表达式很难索引,因为结果取决于会话的当前时区设置 - 当在不同时区执行时,结果也会导致不同的结果。

比较:

你只需要名称应该定义日期的时区。以我的时区维也纳为例:

WHERE cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna' 
AND cai.end_time < '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna' 

可以提供简单的timestamptz值也是如此。你甚至可以:

WHERE cai.end_time >= '2015-09-12'::date 
AND cai.end_time < '2015-09-12'::date + 1 

但是第一个变体不依赖于当前的时区设置。
上述链接中的详细解释。

现在查询可以使用您的索引,并且如果您的表中有很多不同的日子,应该快得多。

+0

感谢您的非常全面的回应;但是,这是每列返回0。 –

+0

@AlienBishop:我想我发现了这个问题。我把'cais.id'误解为'cai.id'。 –

+1

还必须在第一个“JOIN”中将's'更改为'cais'。它现在正在工作,平均而言,性能提高了3倍。 –

0

这是部分答案。第二个可以合并成一个查询:

SELECT (case when key = 'disable_student_nav' then 'spn' 
      when key = 'pacing' then 'tp' 
     end) as type, COUNT(*) 
FROM common_activityinstance cai JOIN 
    common_activityinstance_settings cais 
    ON cai.id = cais.activityinstance_id JOIN 
    common_activitysetting cas 
    ON cas.id = cais.id 
WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ' AND 
     (key, value) in (('disable_student_nav', 'False'), ('pacing', 'teacher')) 
GROUP BY type 

我不知道是否有办法将第一组放入类似的逻辑。例如,如果QZ条件可以应用于所有三组,那么在第一组中添加将很容易。

+0

谢谢。这看起来可能是一个很好的选择。关于第一组,我的原始查询中将“QZ”条件应用于所有三个,那么您将如何添加第一组? –

0

对于每种类型,您可以使用casewhere子句中的条件。但是,第一个查询的having条件不会被满足。

select type, count(*) as count 
from 
(
SELECT cai.id, 
case when q.name!= 'Exit Ticket Quiz' and key = 'disable_student_nav' 
AND value = 'True' OR key = 'pacing' AND value = 'student' then 'spf' 
    when key = 'disable_student_nav' AND value = 'False' then 'spn' 
    when key = 'pacing' AND value = 'teacher' then 'tp' 
end as type 
     FROM common_activityinstance cai 
     JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id 
     JOIN common_activitysetting cas ON cas.id = cais.id 
     JOIN quizzes_quiz q ON q.id = cai.activity_id 
     WHERE cai.end_time::date = '2015-09-12' 
     AND q.name != 'Exit Ticket Quiz' 
     AND cai.activity_type = 'QZ' 
) t 
group by type 
-1

没有办法让这个查询更有效率,没有。你可以设置一个视图或其他任何东西,但它总是必须通过它三次。但是你可以通过PHP或PL/SQL或其他方式进行一些后处理来解决问题。先从一个简单的查询,像这样:

SELECT COUNT(*),cai.id,q.name,键,值 FROM common_activityinstance才 JOIN common_activityinstance_settings CAIS ON cai.id = cais.activityinstance_id JOIN common_activitysetting CAS ON cas.id = cais.id WHERE cai.end_time ::日期= '2015年9月12日' GROUP BY cai.id,q.name,键,值

...我不是从您的解释中清楚这是否会导致合理数量的输出行。但假设它,请编写一些代码将它们按摩到你想要的形状。

+1

我最初的查询耗时约8秒。接受答案中的查询在几分之一秒内执行。 –

+0

这很聪明 - 我错了。 – AngularNewbie

1

这只是一个完全不同的方法的草图:构建一个布尔“超立方体”的所有条件,你需要在您的“交叉制表” 。选择或聚集的子集的逻辑可在以后(如抑制exit_tickets,为其业务逻辑,我不清楚)


SELECT DISTINCT not_exit, disabled, pacing 
    , COUNT(*) AS the_count 
    FROM (SELECT DISTINCT cai.id 
      , EXISTS (SELECT * 
      FROM quizzes_quiz q 
      WHERE q.id = cai.activity_id AND q.name != 'Exit Ticket Quiz' 
      ) AS not_exit 
      , EXISTS (SELECT * 
      FROM common_activityinstance_settings cais 
      JOIN common_activitysetting cas ON cas.id = cais.id 
      WHERE cai.id = cais.activityinstance_id 
      AND cas.key = 'disable_student_nav' AND cas.value = 'True' 
      ) AS disabled 
      , EXISTS (SELECT * 
      FROM common_activityinstance_settings cais 
      JOIN common_activitysetting cas ON cas.id = cais.id 
      WHERE cai.id = cais.activityinstance_id 
      AND cas.key = 'pacing' AND cas.value = 'student') 
      ) AS pacing 
      FROM common_activityinstance cai 
      WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ' 
    ) my_cube 
GROUP BY 1,2,3 
ORDER BY 1,2,3 
    ; 

最后说明:该方法是基于我的假设底层数据模型实际上是一个EAV模型,并且每个学生最多只能出现一次属性。

+0

“EXISTS”与“JOIN”略有不同,因为它将同一个“cai.activity_id”中的“quizzes_quiz”中的多个匹配合并为一个。根据未公开的数据模型和查询背后的意图,这可能是错误的或不相关的,甚至是OP真正想要的。在这种情况下,列名'id'意味着1:1,所以'EXISTS'基本上可以工作 - 但是看起来你错过了'HAVING COUNT(cai.id)= 2'的特殊情况,这需要两步*的聚合。 –

+0

关于数据模型的信息不足。我的(过早的)结论是,这是一个类似于EAV的模型,OP要计算属性(disabled ='false')或(pacing ='student')或两者的出现次数(人数)。 (恕我直言,这两个*的情况下,在原来的查询COUNT(*)= 2的意图) – wildplasser

+0

@ ErwinBrandstetter,那是。 – wildplasser