2016-06-07 98 views
4

假设这个表:如何分组连接多个列?

PruchaseID | Customer | Product | Method 
-----------|----------|----------|-------- 
1   | John  | Computer | Credit 
2   | John  | Mouse | Cash 
3   | Will  | Computer | Credit 
4   | Will  | Mouse | Cash 
5   | Will  | Speaker | Cash 
6   | Todd  | Computer | Credit 

我想生成对他们买什么每一位客户,他们的支付方法的报告。
但我想该报告是每个客户一行,如:

Customer | Products     | Methods 
---------|--------------------------|-------------- 
John | Computer, Mouse   | Credit, Cash 
Will | Computer, Mouse, Speaker | Credit, Cash 
Todd | Computer     | Credit 

什么我发现到目前为止是组接续模式采用XML PATH方法,如:

SELECT 
    p.Customer, 
    STUFF(
     SELECT ', ' + xp.Product 
     FROM Purchases xp 
     WHERE xp.Customer = p.Customer 
     FOR XML PATH('')), 1, 1, '') AS Products, 
    STUFF(
     SELECT ', ' + xp.Method 
     FROM Purchases xp 
     WHERE xp.Customer = p.Customer 
     FOR XML PATH('')), 1, 1, '') AS Methods 
FROM Purchases 

这给了我的结果,但我关心的是这个速度。
乍一看有三种不同的选择在这里进行,其中两个将乘以购买的行数。最终这会慢慢减慢。

那么,有没有办法做到这一点有更好的表现?
我想添加更多的列来聚合,我应该为每个列做这个STUFF()块吗?这听起来不够快。

Siggestions?

+0

好吧,你正在反规范化你的数据来做到这一点,因此性能将是一个潜在的挑战。 XML方法是将数据非规范化为分隔列表的最佳方法。 –

+0

使用'for xml path'时要小心,如果你有例如'&'的数据,它可能会让你大吃一惊。 Aaron Bertrand做了一个[比较](http://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation)您可能想要查看的不同方法。 –

回答

4

只是一个想法:

DECLARE @t TABLE (
    Customer VARCHAR(50), 
    Product VARCHAR(50), 
    Method VARCHAR(50), 
    INDEX ix CLUSTERED (Customer) 
) 

INSERT INTO @t (Customer, Product, Method) 
VALUES 
    ('John', 'Computer', 'Credit'), 
    ('John', 'Mouse', 'Cash'), 
    ('Will', 'Computer', 'Credit'), 
    ('Will', 'Mouse', 'Cash'), 
    ('Will', 'Speaker', 'Cash'), 
    ('Todd', 'Computer', 'Credit') 

SELECT t.Customer 
    , STUFF(CAST(x.query('a/text()') AS NVARCHAR(MAX)), 1, 2, '') 
    , STUFF(CAST(x.query('b/text()') AS NVARCHAR(MAX)), 1, 2, '') 
FROM (
    SELECT DISTINCT Customer 
    FROM @t 
) t 
OUTER APPLY (
    SELECT DISTINCT [a] = CASE WHEN id = 'a' THEN ', ' + val END 
        , [b] = CASE WHEN id = 'b' THEN ', ' + val END 
    FROM @t t2 
    CROSS APPLY (
     VALUES ('a', t2.Product) 
      , ('b', t2.Method) 
    ) t3 (id, val) 
    WHERE t2.Customer = t.Customer 
    FOR XML PATH(''), TYPE 
) t2 (x) 

输出:

Customer Product     Method  
---------- -------------------------- ------------------ 
John  Computer, Mouse   Cash, Credit 
Todd  Computer     Credit 
Will  Computer, Mouse, Speaker Cash, Credit 

更多的性能优势,另一个想法:

IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL 
    DROP TABLE #EntityValues 

DECLARE @Values1 VARCHAR(MAX) 
     , @Values2 VARCHAR(MAX) 

SELECT Customer 
    , Product 
    , Method 
    , RowNum = ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY 1/0) 
    , Values1 = CAST(NULL AS VARCHAR(MAX)) 
    , Values2 = CAST(NULL AS VARCHAR(MAX)) 
INTO #EntityValues 
FROM @t 

UPDATE #EntityValues 
SET 
     @Values1 = Values1 = 
     CASE WHEN RowNum = 1 
      THEN Product 
      ELSE @Values1 + ', ' + Product 
     END 
    , @Values2 = Values2 = 
     CASE WHEN RowNum = 1 
      THEN Method 
      ELSE @Values2 + ', ' + Method 
     END 

SELECT Customer 
     , Values1 = MAX(Values1) 
     , Values2 = MAX(Values2) 
FROM #EntityValues 
GROUP BY Customer 

但是有一些限制:

Customer  Values1      Values2 
------------- ----------------------------- ---------------------- 
John   Computer, Mouse    Credit, Cash 
Todd   Computer      Credit 
Will   Computer, Mouse, Speaker  Credit, Cash, Cash 

还检查我的旧文章有关字符串聚合:

http://www.codeproject.com/Articles/691102/String-Aggregation-in-the-World-of-SQL-Server

+0

有用的替代方法。 – niksofteng

+1

嗨Devart,我喜欢那样! – Shnugo

+0

@Shnugo谢谢:)非常感谢... – Devart

1

这是用例的递归的CTE(公共表表达式)之一。你可以在这里了解更多https://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

; 
WITH CTE1 (PurchaseID, Customer, Product, Method, RowID) 
AS 
(
    SELECT 
     PurchaseID, Customer, Product, Method, 
     ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Customer) 
    FROM 
     @tbl 
     /* This table holds source data. I ommited declaring and inserting 
     data into it because that's not important. */ 
) 
, CTE2 (PurchaseID, Customer, Product, Method, RowID) 
AS 
(
    SELECT 
     PurchaseID, Customer, 
     CONVERT(VARCHAR(MAX), Product), 
     CONVERT(VARCHAR(MAX), Method), 
     1 
    FROM 
     CTE1 
    WHERE 
     RowID = 1 
    UNION ALL 
    SELECT 
     CTE2.PurchaseID, CTE2.Customer, 
     CONVERT(VARCHAR(MAX), CTE2.Product + ',' + CTE1.Product), 
     CONVERT(VARCHAR(MAX), CTE2.Method + ',' + CTE1.Method), 
     CTE2.RowID + 1 
    FROM 
     CTE2 INNER JOIN CTE1 
      ON CTE2.Customer = CTE1.Customer 
      AND CTE2.RowID + 1 = CTE1.RowID 
) 

SELECT Customer, MAX(Product) AS Products, MAX(Method) AS Methods 
FROM CTE2 
GROUP BY Customer 

输出:

Customer Products    Methods 
John  Computer,Mouse   Credit,Cash 
Todd  Computer    Credit 
Will  Computer,Mouse,Speaker Credit,Cash,Cash 
+2

嗨,@JamesZ上面发布了一个链接[性能比较](http://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation)。你可以看看这个。您的代码可以正常工作,但**性能很差** ... – Shnugo

1

另一种解决方案是组串联的CLR方法@aaron贝特朗做这个here的性能比较。 如果您可以部署CLR,然后从http://groupconcat.codeplex.com/下载免费的脚本。 以及文档中的所有详细信息。 您的查询只会变成这样

SELECT Customer,dbo.GROUP_CONCAT(product),dbo.GROUP_CONCAT(method) 
FROM Purchases 
GROUP BY Customer 

这个查询短,易于记忆和使用,XML方法也做了工作,但记住的代码是有点困难(ATLEAST我)和毛骨悚然的像XML实体化这样的问题可以得到解决,并且在他的博客中也描述了一些陷阱。

也从性能角度看使用。查询很耗时我在性能方面遇到了同样的问题。我希望你能找到我在https://dba.stackexchange.com/questions/125771/multiple-column-concatenation 这里提出的这个问题,检查kenneth fisher给出的版本2嵌套的xml连接方法或者spaggettidba建议的unpivot/pivot方法。