2013-10-11 55 views
0

我需要使用T-SQL从以下段落中解析出包含在{}中的字符串,然后显示它们。在{花括号}中解析标记

这是一个带有{Term1}的测试句子。有时,{Term2}可能是{Phrase Term3}之类的单词或短语。重复{Term2}。某些条款可能是另一个术语(如{Term2})的复数形式。这是一个真正的{简单}术语。

期望的结果:

Term1 
Term2 
Phrase Term3 
Term2 
Term2 
Simple 
+1

为什么你需要做到这一点的T-SQL?看起来像C#或其他任何其他更好的,真的。 –

回答

3

您可以用多语句表值函数做到这一点,但我真的认为这类型解析的要好得多左到更强大的语言。这将处理令牌{up to 255 characters},并根据SQL Server的版本输入最多约8000个字符的字符串。如果您需要更多,请将sys.all_columns替换为your own numbers table。请注意,我没有去任何努力从无效的令牌序列,以保护...

CREATE FUNCTION dbo.ParseTokens 
(
    @string NVARCHAR(MAX), 
    @token1 NVARCHAR(255), 
    @token2 NVARCHAR(255) 
) 
RETURNS @t TABLE([Index] INT IDENTITY(1,1), Item NVARCHAR(255)) 
AS 
BEGIN 
    INSERT @t(Item) 
    SELECT SUBSTRING(x, 1, COALESCE(NULLIF(CHARINDEX(@token2, x)-1,-1),255)) 
    FROM 
    (
     SELECT Number, x = SUBSTRING(@string, Number, 
     CHARINDEX(@token1, @string + @token1, Number) - Number) 
     FROM 
     (
     SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) 
      FROM sys.all_columns 
    ) AS n(Number) WHERE Number <= CONVERT(INT, LEN(@string)) 
     AND SUBSTRING(@token1 + @string, Number, LEN(@token1)) = @token1 
    ) AS y 
    ORDER BY Number OPTION (MAXDOP 1); 

    DELETE @t WHERE [Index] = 1; 

    RETURN; 
END 
GO 

使用范例 - 在一个独立的字符串:

DECLARE @x NVARCHAR(MAX); 

SET @x = N'foo{bar} and think {splunge}'; 

SELECT Item FROM dbo.ParseTokens(@x, '{', '}') ORDER BY [Index]; 

结果:

Item 
------- 
bar 
splunge 

样品用法 - 对表:

DECLARE @x TABLE(ID INT IDENTITY(1,1), n NVARCHAR(MAX)); 

INSERT @x SELECT N'Here is a test sentence with a {Term1}. Sometime, a {Term2} 
    could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms 
    could be a plural form of a another Term like {Term2}s. Here is a real 
    {Simple} Term.'; 

INSERT @x SELECT N'Hello {foo} there {bar} ...'; 

SELECT t.ID, p.Item 
FROM @x AS t 
CROSS APPLY dbo.ParseTokens(t.n, '{', '}') AS p; 

个结果:

ID  Item 
---- ------------ 
1  Term1 
1  Term2 
1  Phrase Term3 
1  Term2 
1  Term2 
1  Simple 
2  foo 
2  bar 
3

您可以通过封闭元件替换所有{有开始元素和所有}您的字符串转换为XML,然后查询令牌的XML。

declare @S nvarchar(max) 
set @S = N'Here is a test sentence with a {Term1}. Sometime, a {Term2} could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms could be a plural form of a another Term like {Term2}s. Here is a real {Simple} Term.' 

select T.N.value('text()[1]', 'nvarchar(max)') as Token 
from (select cast(replace(replace(@S, N'{', N'<token>'), N'}', N'</token>') as xml)) as S(X) 
    cross apply S.X.nodes('token') as T(N) 

SQL Fiddle

+1

哦,你和你的幻想XML! (1) –