2013-10-21 93 views
1

我在旧的传统销售系统中有一堆产品描述数据,我们试图通过对文本描述字段中包含的型号进行最佳猜测来运行某些销售分析。字符串中的第一个匹配关键字TSQL

所以我的销售线是这个样子:

LineitemID | Description 
---- 
1 | Sony Headphones for a Sony DHJ232 
2 | Sony DHJ232 in blue 
3 | SANYO KI8767 with carry case 

然后我有一个单独的表,其中包含所有的潜在的产品范围。

ProductRange 
---- 
Sony DHJ232 
SANYO KI8767 
Sony Headphones 

我想编写一个查询将返回我所有了LineItem,与该ProductRange他们与结婚了最好的猜测,这是一个很简单的简单连接和LIKE语句,然而,并发症在LineItem#1中出现,其中我们提到了两个不同的产品范围,这会导致多个匹配,其中一个不正确。

在这种情况下,找到了多个匹配项,我想假设字符串中的第一个匹配是最正确的。即Sony Headphones,而不是Sony DHJ232。

任何人都可以提供一些关于最佳方法的建议吗?

回答

1

就是这样。您应该在“说明”字段中使用子字符串的位置对您的结果进行排序(使用CHARINDEX()),然后选择第一个(最低)。

SELECT LineitemId,Description,ProductRange 

FROM 
(
SELECT LineitemId,Description,PR.ProductRange as ProductRange, 
     ROW_NUMBER() OVER (PARTITION BY LineitemId 
          ORDER BY CHARINDEX(PR.ProductRange,Description) 
         ) AS RowN 

FROM T 
JOIN PR on (T.Description LIKE '%'+PR.ProductRange+'%') 
) as T1 
WHERE RN=1 
+0

这看起来真对我好!尽管如此,您可能希望规范化遗留数据。根据我对遗留数据和自由文本字段的体验,可能会出现双空间并且可能有错误匹配的情况。只是一个想法。 – JoeFletch

0
;WITH MATCH_START AS 
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
     JOIN PRODUCT ON LI.DESCRIPTION LIKE PRODUCT.PRODUCTRANGE+'%' 
), 
MATCH_CONTAINS AS 
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
     JOIN PRODUCT ON LI.DESCRIPTION LIKE '%'+PRODUCT.PRODUCTRANGE+'%' 
), 
MIN_START_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_START 
    GROUP BY PRODUCTRANGE 
), 
MIN_CONTAIN_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_CONTAINS 
    GROUP BY PRODUCTRANGE 
) 

SELECT MS.PRODUCTRANGE,MS.DESCRIPTION, MS.LINEITEMID FROM MATCH_START MS 
JOIN MIN_START_POS MSP ON MS.POS = MSP.MIN_POS AND MSP.PRODUCTRANGE = MS.PRODUCTRANGE 

UNION 

SELECT MC.PRODUCTRANGE, MC.DESCRIPTION, MC.LINEITEMID FROM MATCH_CONTAINS MC 
JOIN MIN_CONTAIN_POS MCP ON MC.POS = MCP.MIN_POS AND MCP.PRODUCTRANGE = MC.PRODUCTRANGE 
AND MC.PRODUCTRANGE NOT IN (SELECT PRODUCTRANGE FROM MATCH_START) 

--first匹配productRange以单词开始,后来匹配containint的。

,例如用如下数据: SELECT * FROM LINEITEM

LineItemId Description 
----------- -------------------------------------- 
1   Sony Headphones for a Sony DHJ232 
2   Sony DHJ232 in blue 
3   SANYO KI8767 with carry case 
4   SANYO KI8767 with carry case 2 
5   Sony Headphones for a Sony DHJ232 B 

SELECT * FROM产品

ProductRange 
---------------------- 
SANYO KI8767 
Sony DHJ232 
Sony Headphones 

结果是

PRODUCTRANGE  DESCRIPTION       LINEITEMID 
--------------- ------------------------------------- ----------- 
SANYO KI8767  SANYO KI8767 with carry case   3 
Sony DHJ232  Sony DHJ232 in blue     2 
Sony Headphones Sony Headphones for a Sony DHJ232  1 
0

个人而言,我想能够优先考虑哪个“范围”是根据超过其序数的位置来选择的;所以我想实现这样的: - 。

create table dbo.Sales (
    LineitemID int identity (1,1) not null primary key, 
    [Description] varchar(50) 
) 
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony DHJ232') 
insert into dbo.Sales ([Description]) values ('Sony DHJ232 in blue') 
insert into dbo.Sales ([Description]) values ('SANYO KI8767 with carry case') 
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony PS3') 

create table dbo.ProductRange (
    ProductRangeId int identity (1,1) not null primary key, 
    RangeName varchar(50), 
    Significance int 
) 
insert into dbo.ProductRange (RangeName, Significance) values ('Sony DHJ232', 1) 
insert into dbo.ProductRange (RangeName, Significance) values ('SANYO KI8767', 1) 
insert into dbo.ProductRange (RangeName, Significance) values ('Sony Headphones', 2) 
go 
CREATE FUNCTION [dbo].GetRange 
(
    @description varchar(50) 
) 
RETURNS INT 
AS 
BEGIN 

    declare @ProductRangeId int 

    select top 1 @ProductRangeId=pr.ProductRangeId 
    from dbo.ProductRange pr 
    where @description like '%'+pr.RangeName+'%' 
    order by pr.Significance 

    RETURN @ProductRangeId 
END 
go 
select s.*, dbo.GetRange(s.Description) as RangeId 
from dbo.Sales s 

这将允许在DBO的[意义]列[ProductRange]指定如果超过一个值是“打”返回什么样的价值。

从这个输出是: - 。

LineitemID Description          RangeId 
----------- -------------------------------------------------- ----------- 
1   Sony Headphones for a Sony DHJ232     1 
2   Sony DHJ232 in blue        1 
3   SANYO KI8767 with carry case      2 
4   Sony Headphones for a Sony PS3      3 

它可以很容易地加入回DBO [ProductRange]

相关问题