2010-06-26 21 views
10

我想比较用户输入中的单个单词和表中某列的单个单词。SQL:将一列分割成多个单词以搜索用户输入

例如,在我的表考虑这些行:

ID Name 
1 Jack Nicholson 
2 Henry Jack Blueberry 
3 Pontiac Riddleson Jack 

考虑到用户的输入是“庞蒂亚克杰克”。我想为每个匹配分配权重/等级,所以我不能使用一个LIKE(WHERE名称LIKE @SearchString)。

如果Pontiac出现在任何一行中,我想给它10分。每一场比赛杰克得到另一个10分等,所以第3行将得到20分,第1和第2行得到10.

我已经拆分用户输入到单个单词,并将它们存储到一个临时表@SearchWords (字)。

但我不知道有一种方法来让我可以结合这一点的SELECT语句。也许我正在以这种错误的方式去做?

干杯, WT

+1

你有没有考虑使用SQL SErver全文搜索? – 2010-06-26 11:14:36

+0

是的,我有 - 它对我们来说效果不好,而且很难根据我们的要求进行定制。 – 2010-06-26 11:16:41

+1

+1全文搜索 - 不一定是SQL Server,但例如lucene.net。 – 2010-06-26 11:16:43

回答

1

对于SQL Server,试试这个:

SELECT Word, COUNT(Word) * 10 AS WordCount 
FROM SourceTable 
INNER JOIN SearchWords ON CHARINDEX(SearchWords.Word, SourceTable.Name) > 0 
GROUP BY Word 
+0

漂亮,优雅的解决方案。我想,OP的表格必须将单词链接回原始搜索短语 - 因此,获得整个短语的分数就如同加入短语一样简单,并总计按整个短语分组的字数。好用户名btw ...我最喜欢的xkcds之一:) – 2010-06-28 21:47:02

0

这个是什么? (这是MySQL的语法,我想你只需要更换CONCAT并用+做)

SELECT names.id, count(searchwords.word) FROM names, searchwords WHERE names.name LIKE CONCAT('%', searchwords.word, '%') GROUP BY names.id 

然后,你将不得不与名称表的ID和计数匹配词的SQL结果到那个ID。

0

你可以通过一个共同的表表达式可以算出加权做到这一点。例如:

--** Set up the example tables and data 
DECLARE @Name TABLE (id INT IDENTITY, name VARCHAR(50)); 
DECLARE @SearchWords TABLE (word VARCHAR(50)); 

INSERT INTO @Name 
     (name) 
VALUES ('Jack Nicholson') 
     ,('Henry Jack Blueberry') 
     ,('Pontiac Riddleson Jack') 
     ,('Fred Bloggs'); 

INSERT INTO @SearchWords 
     (word) 
VALUES ('Jack') 
     ,('Pontiac'); 

--** Example SELECT with @Name selected and ordered by words in @SearchWords 
WITH Order_CTE (weighting, id) 
AS (
    SELECT COUNT(*) AS weighting 
     , id 
     FROM @Name AS n 
     JOIN @SearchWords AS sw 
     ON n.name LIKE '%' + sw.word + '%' 
    GROUP BY id 
) 
SELECT n.name 
    , cte.weighting 
    FROM @Name AS n 
    JOIN Order_CTE AS cte 
    ON n.id = cte.id 
ORDER BY cte.weighting DESC; 

使用这种技术,如果您愿意,也可以将值应用于每个搜索词。所以你可以让杰克比庞蒂克更有价值。这看起来像这样:

--** Set up the example tables and data 
DECLARE @Name TABLE (id INT IDENTITY, name VARCHAR(50)); 
DECLARE @SearchWords TABLE (word VARCHAR(50), value INT); 

INSERT INTO @Name 
     (name) 
VALUES ('Jack Nicholson') 
     ,('Henry Jack Blueberry') 
     ,('Pontiac Riddleson Jack') 
     ,('Fred Bloggs'); 

--** Set up search words with associated value 
INSERT INTO @SearchWords 
     (word, value) 
VALUES ('Jack',10) 
     ,('Pontiac',20) 
     ,('Bloggs',40); 


--** Example SELECT with @Name selected and ordered by words and values in @SearchWords 
WITH Order_CTE (weighting, id) 
AS (
    SELECT SUM(sw.value) AS weighting 
     , id 
     FROM @Name AS n 
     JOIN @SearchWords AS sw 
     ON n.name LIKE '%' + sw.word + '%' 
    GROUP BY id 
) 
SELECT n.name 
    , cte.weighting 
    FROM @Name AS n 
    JOIN Order_CTE AS cte 
    ON n.id = cte.id 
ORDER BY cte.weighting DESC;  
0

在我看来,最好的办法是维护一个单独的表与所有单个单词。例如:

ID  Word  FK_ID 
1  Jack  1 
2  Nicholson 1 
3  Henry  2 
(etc) 

此表将不断更新与触发器,和你有“字”,一个非聚集索引“FK_ID”。然后用SQL来产生你的权重将是简单而高效的。

0

如何这样的事情....

Select id, MAX(names.name), count(id)*10 from names 
inner join @SearchWords as sw on 
    names.name like '%'+sw.word+'%' 
group by id 

假设一个名为 “名” 的名称该表。