2012-06-06 55 views
1

我有这个表:复杂SQL编写

table session(
ID number, 
SessionID VarChar, 
Date, 
Filter 
) 

此表包含搜索信息,例如:

ID SessionID     Date    filter 
4 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=5 
6 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon 
7 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon&meagPixel=12.1 
8 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon 
10 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Nikon 
12 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=12.1 
13 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=12.1&opticalZoom=True 
14 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 meagPixel=12.1&opticalZoom=True&brand=Panasonic 
16 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=500.00 
18 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00 
19 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00&brand=Olympus 
21 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000 
22 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica 
23 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00 
24 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True 
25 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2 
26 peqq421gaspts3nuulq5mwcq 24/05/2012 13:50 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345 
27 peqq421gaspts3nuulq5mwcq 24/05/2012 13:58 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2 
41 poiq41111spts00000q5aaaa 27/05/2012 13:48 meagPixel=5 

我想唯一的搜索。独特的搜索是:

  • 用户(会话)
  • 最长搜索(过滤器),如果第一个过滤器的变化 - 它需要被视为新的搜索(过滤器)

由于ASP.NET不保证SessionID是唯一的(SessionID,Date)。

我没有走远后:

SELECT  MAX(Filter) 
FROM   Session 
GROUP BY SessionID 

BTW的结果,因为我给了应该返回这个示例表数据:

ID SessionID     Date    filter    
4 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=5  
7 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon&meagPixel=12.1  
10 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Nikon  
14 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 meagPixel=12.1&opticalZoom=True&brand=Panasonic  
16 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=500.00   
19 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00&brand=Olympus  
26 peqq421gaspts3nuulq5mwcq 24/05/2012 13:50 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345  
41 poiq41111spts00000q5aaaa 27/05/2012 13:48 meagPixel=5  

感谢您的帮助和指导。

+0

你能再次检查你的预期输出吗? *品牌=佳能*和*品牌=佳能和meagPixel = 12.1 *具有相同的第一个过滤器,但他们单独列出。虽然* zoomRange = 2000&brand = Leica&price = 1995.00&opticalZoom = True&meagPixel = 16.2&weight = 345 *只有一个条目,而在主表中有一条记录* zoomRange = 2000&brand = Leica&price = 1995.00&opticalZoom = True&meagPixel = 16.2 * –

+0

因为它不是很清楚我会改变它。 – Nir

+0

我非常非常抱歉,我现在只编辑我的帖子 - 我使用sql server compact 4而不是sql server standard edition – Nir

回答

1

@GarethD - TX在架构和插入查询。 我试过了一些不同的方法。我不确定这是否适用于所有情况。它在mysql和mssql中工作。

  select * 
      from tsession t1 
      where not exists (
          select * 
          from tsession t2 
          where t2.filter like concat(t1.filter,'%') 
          and t1.filter<>t2.filter 
          and t1.sessionid=t2.sessionid) 
      order by id; 

这给出了问题中所需的确切结果。

+0

你在外面的地方“和过滤器不为空”丢失。我真的不知道它是否回答了所有情况..你认为呢? – Nir

+0

这肯定会得到最长的过滤器* concat(t1.filter,'%')*将确保。需要进一步测试的场景是在任何col上有其他条件的地方(对于任何分组要求)。对于过滤器不为null,不清楚过滤器可能为空的数据。 –

+0

是的,我已经在真实的桌子上测试过它,并且在那里有空值。这将需要更多的测试,因为它似乎太简单了,无法得到它:) – Nir

0

为了获得最长的搜索过滤器,你需要做的是这样的:

select s.* 
from (select s.*, 
      row_number() over (partition by sessionid order by len desc) as rownum 
     from (select s.*, len(filter) as len 
      from session s 
      ) s 
    ) s 
where rownum = 1 

我使用Windows函数这样做。你可以通过使用聚合和连接来做同样的事情。

但是,您所说的会话不是真正的标识符。会话/过滤器是。下面的查询非常得到你想要什么:(唯一的变化是将分区子句包括过滤器)

select s.* 
from (select s.*, 
      row_number() overo over (partition by sessionid, filter 
             order by len desc) as rownum 
     from (select s.*, len(filter) as len 
      from session s 
      ) s 
    ) s 
where rownum = 1 

您可能有重复。如果你想要所有的重复,一个稍微不同的查询将工作。

0

首先,您的样本数据看起来有误,我认为第25,26和27行应该都出现在您的最终数据中。 27肯定应该是因为它是会话ID和日期组合的唯一条目。

假设以上是正确的,那么我认为我已经正确地建立了你的逻辑。

步骤1是定义为每个滤波器的第一检索词和顺序在它的会话中发生:

;WITH CTE AS 
( SELECT *, 
      SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm], 
    FROM Session 
) 

下一步是制定出如果每个搜索是一个新的搜索,或者继续前面的搜索。这是通过在会话中获取上一个搜索项(为什么SessionOrder在上一个CTE中定义)以及确定第一个搜索项是否相同来完成的。

, CTE2 AS 
( SELECT T1.*, 
      CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch] 
    FROM CTE T1 
      LEFT JOIN CTE T2 
       ON T1.SessionID = T2.SessionID 
       AND T1.Date = T2.Date 
       AND T1.FirstTerm != T2.FirstTerm 
       AND T1.SessionOrder = T2.SessionOrder + 1 
) 

接下来,每个新搜索都需要它在会话中自己的排名,以便对purpuses进行分组。

, CTE3 AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder] 
    FROM CTE2 T1 
      OUTER APPLY 
      ( SELECT SUM(NewSearch) [SearchNumber] 
       FROM CTE2 T2 
       WHERE T1.SessionOrder >= T2.SessionOrder 
       AND  T1.SessionID = T2.SessionID 
       AND  T1.Date = T2.Date 
      ) c 
) 

最后,所有你:那么你有你的规则定义(会话ID,日期,和第一查询词的独特组合),然后你可以根据过滤器的长度的独特组合内订购的每个项目需要做的是限制的结果,最长的检索词的SessionID,日期和第一过滤条件的每个组合:

SELECT ID, SessionID, Date, Filter 
FROM CTE3 
WHERE SearchOrder = 1 
ORDER BY ID 

通常我会把这一切一起SQLFiddle,而不是在这里发表一个完整的工作的例子,但它似乎今天没有工作。因此,这里是我的我用来测试你的数据完整的SQL:

CREATE TABLE #Session (ID INT, SessionID VARCHAR(50), Date DATETIME, Filter VARCHAR(200)) 
INSERT INTO #Session VALUES 
    (2, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (4, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=5'), 
    (6, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (7, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon&meagPixel=12.1'), 
    (8, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (10, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Nikon'), 
    (12, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1'), 
    (13, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1&opticalZoom=True'), 
    (14, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'meagPixel=12.1&opticalZoom=True&brand=Panasonic'), 
    (16, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=500.00'), 
    (18, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00'), 
    (19, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00&brand=Olympus'), 
    (21, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000'), 
    (22, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica'), 
    (23, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00'), 
    (24, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True'), 
    (25, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'), 
    (26, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:50', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345'), 
    (27, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:58', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'), 
    (41, 'poiq41111spts00000q5aaaa', '27/05/2012 13:48', 'meagPixel=5') 

;WITH CTE AS 
( SELECT *, 
      SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm], 
    FROM #Session 
), CTE2 AS 
( SELECT T1.*, 
      CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch] 
    FROM CTE T1 
      LEFT JOIN CTE T2 
       ON T1.SessionID = T2.SessionID 
       AND T1.Date = T2.Date 
       AND T1.FirstTerm != T2.FirstTerm 
       AND T1.SessionOrder = T2.SessionOrder + 1 
), CTE3 AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder] 
    FROM CTE2 T1 
      OUTER APPLY 
      ( SELECT SUM(NewSearch) [SearchNumber] 
       FROM CTE2 T2 
       WHERE T1.SessionOrder >= T2.SessionOrder 
       AND  T1.SessionID = T2.SessionID 
       AND  T1.Date = T2.Date 
      ) c 
) 
SELECT ID, SessionID, Date, Filter 
FROM CTE3 
WHERE SearchOrder = 1 
ORDER BY ID 

DROP TABLE #Session 

附录

OK,根据您的结果集,你实际上并不想通过组日期列,您只需按照第一个搜索词和sessionID分组的顺序放置行。

该查询产生与您的样本数据相同的结果。我已经在2008 R1中测试过了,但是看不出它在SQL-Server CE中不起作用的原因。

;WITH CTE AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) ORDER BY LEN(Filter) DESC) [RowNumber] 
    FROM Session 
) 
SELECT * 
FROM CTE 
WHERE RowNumber = 1 
ORDER BY ID 

最终解决方案的SQL Fiddle

+0

我在25,26,27行没有错误。 26是该过滤器27中最长的搜索,是用户所做的一步。 – Nir

+0

是的,但27有不同的时间到26;因此根据您的标准,这是一个新的会议? – GarethD

+0

@GarethD - 在sqlfiddle.com上有什么不适合你? –