2014-08-27 142 views
0

我有一个SQL查询A(见下文有详细介绍)返回表如下:数据标准化

cluster brand amount 
0   bos  600 
0   phi  300 
0   har  100 
1   pro 2500 
1   wal 1500 
1   ash 1000 
2   dil 4200 
2   sor  500 
2   van  300 
... 

不过,我想显示量不大,但该部分量相比,在群集中的总量,像下表:

cluster brand amount 
0   bos 0.60 
0   phi 0.30 
0   har 0.10 
1   pro 0.50 
1   wal 0.30 
1   ash 0.20 
2   dil 0.84 
2   sor 0.10 
2   van 0.06 
... 

我应该如何改变我的SQL,这样我可以克服所有款项访问和在一个集群中,而且还有多行与相同的群集?

** **详细

SQL服务器:MySQL中,通过Python-MySQL的连接器接口。

当前的SQL查询来产生第一个表:

SELECT c.cluster, brand, COUNT(o.id) AS brand_amount 
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35' 
GROUP BY cluster, brand 
HAVING brand_amount > 100 
ORDER BY c.cluster ASC, brand_amount DESC; 

orders(主键id)链接persons(外键pid)与articles(外键aid)。 Articles有一定的品牌(外键brand_id),它们与表brands中的名称有关。

的每个群集物品的总量可以用下面的SQL查询来检索:

SELECT c.cluster, COUNT(o.pid) AS amount 
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35' 
GROUP BY cluster 
ORDER BY c.cluster ASC, amount DESC; 

结果:

cluster amount 
0  1000 
1  5000 
2  5000 

不过,我似乎无法给两个SQL查询相结合。

+2

数据是不是在SQL查询表归! :) – NoobEditor 2014-08-27 13:03:57

回答

1

你可以做聚类

一个子查询联接相加的金额
select t1.cluster, amount/sumAmount 
from Table1 t1 
join (select cluster, sum(amount) as sumAmount 
     from Table1 
     group by cluster)s 
on t1.cluster = s.cluster 

看到SqlFiddle

编辑

SELECT 
    c.cluster, 
    brand, 
    COUNT(o.id)/coalesce(s.sumBrandAmount, 0) AS brand_amount -- of course it would be nice to check for dividing by 0 
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
LEFT JOIN (select c1.id, count(o1.id) as sumBrandAmount 
      from nyon_all.clustering c1 
      left join nyon_all.persons p1 on p1.id = c1.pid 
      left join nony_all.orders as o1 on o1.id = p1.id 
      --maybe some where clause as in your main query 
      group by c1.id) s 
           ON s.id = c.id 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35' 
GROUP BY cluster, brand 
HAVING brand_amount > 100 
ORDER BY c.cluster ASC, brand_amount DESC; 
+0

感谢您的回答,但我不明白。我应该用我的大查询替换Table1吗?如果我尝试这样做,我会在“字段列表”中收到错误代码1054:未知列“金额”。 我不熟悉SqlFiddle。该链接显示我两个空方块。我该怎么处理它? – physicalattraction 2014-08-27 13:28:25

+0

@physicalattraction似乎有一些问题与SqlFiddle(并不总是工作)...我会尝试编辑我的答案与您的查询。 – 2014-08-27 13:32:20

+0

@physicalattraction查看编辑答案。当然,为了使事情更容易阅读,你可以创建一个基于你的问题的查询视图,并使用它,而不是重写所有... – 2014-08-27 13:38:13