2015-09-24 84 views
2

我有一个C#应用程序,它生成一个SQL查询,该查询应该用于从SQL Server中用户选择的列中删除特殊字符。查询我目前所面对的是:在多个字段中替换多个特殊字符SQL

UPDATE [TableA] 
SET [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]), 1), ''), 
    [Name] = REPLACE([Name], SUBSTRING([Name], PATINDEX('%[^a-zA-Z0-9 ]%', [Name]), 1), ''), 
    [Acct] = REPLACE([Acct], SUBSTRING([Acct], PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]), 1), '') 
WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
     PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
     PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0; 
GO 

此作品以去除第一特殊字符,但如果字符串有多个特殊的字符,它只是删除了第一

  1. “薪水&工资”变成了“薪酬工资”好!

  • “薪金&工资 - 其他” 变成 “薪金工资 - 其他” BAD!
  • 我的问题是:

    我怎样才能修改上面的查询来删除多个特殊字符,同时仍然能够通过C#来执行查询?

    谢谢你的时间。


    编辑。很显然,我可以做类似

    declare @input varchar(500), @Action char(1) 
    set @Input = '80-82/5 O$%*#@)(J^#[email protected]!n & '' Bacon St' 
    set @Action = 'A' 
    
        DECLARE @i int 
        DECLARE @result varchar(500) 
        SET @result = @input 
    
        if @Action = 'A' 
        BEGIN 
         SET @i = patindex('%[^a-zA-Z0-9 ]%', @result) 
         WHILE @i > 0 
         BEGIN 
          SET @result = STUFF(@result, @i, 1, '') 
          SET @i = patindex('%[^a-zA-Z0-9 ]%', @result) 
         END 
        END 
    
    print @Input 
    print @Result 
    

    ,但我看不出如何适应这种查询可以在多个领域,从C#的工作。任何帮助在这里将不胜感激。

    回答

    2

    您可以使用Recursive CTE以递归应用REPLACE功能:

    ;WITH StripSpecialChars AS (
        SELECT id, 0 AS lvl, 
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '') 
        FROM TableA 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i) 
        WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0 
    
        UNION ALL 
    
        SELECT id, lvl = lvl + 1,      
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '') 
        FROM StripSpecialChars 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i) 
        WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0 
    ) 
    

    的CTE终止,只要没有多个特殊字符来代替。

    具有每id最大lvl值的行是包含[EpiNum][Name][Acct]字段的剥离下来值之一。因此,你可以使用下面的代码在一个单独的SQL语句执行UPDATE

    ;WITH StripSpecialChars AS (
    ... above query here ... 
    ) 
    UPDATE t1 
    SET t1.[EpiNum] = t2.[EpiNum], 
        t1.[Name] = t2.[Name], 
        t1.[Acct] = t2.[Acct] 
    FROM TableA AS t1 
    INNER JOIN (SELECT id, [EpiNum], [Name], [Acct], 
            ROW_NUMBER() OVER (PARTITION BY id 
                 ORDER BY lvl DESC) AS rn 
          From StripSpecialChars) AS t2 
    ON t1.id = t2.id AND t2.rn = 1 
    

    Demo here

    编辑:

    但如果是在TableA没有PK列,那么你就可以用CTE包装你的表格,使用ROW_NUMBER模拟PK,最后在CTE上执行更新:

    ;WITH TableA_PK AS (
        SELECT [EpiNum], [Name], [Acct], 
         ROW_NUMBER() OVER (ORDER BY [EpiNum]) AS id 
        FROM TableA 
    ), StripSpecialChars AS (
        SELECT id, 0 AS lvl, 
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '')   
        FROM TableA_PK 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i) 
        WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0 
    
        UNION ALL 
    
        SELECT id, lvl = lvl + 1,      
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '')    
        FROM StripSpecialChars 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i) 
        CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i) 
        WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0 
    ) 
    UPDATE t1 
    SET t1.[EpiNum] = t2.[EpiNum], 
        t1.[Name] = t2.[Name], 
        t1.[Acct] = t2.[Acct] 
    FROM TableA_PK AS t1 
    INNER JOIN (SELECT id, [EpiNum], [Name], [Acct], 
            ROW_NUMBER() OVER (PARTITION BY id 
                 ORDER BY lvl DESC) AS rn 
          FROM StripSpecialChars) AS t2 
    ON t1.id = t2.id AND t2.rn = 1 
    

    Demo here

    +0

    非常感谢您的答复/时间。我认为这是我的赢家,因为它不需要函数调用,看起来会很快。有一个问题,我不想通过添加'id'列来修改我的初始表结构。我不清楚这是否可以使用上述方法完成?任何想法非常感谢... – MoonKnight

    +1

    @Killercam请检查我所做的编辑。 –

    +0

    这真是太好了,非常感谢你的时间...... – MoonKnight

    0

    如果这是一次性努力,我会建议多次运行update,直到所有角色都消失。这可能是实现这一目标的最快方式。

    这样做后,修复表有只接受所需的值约束:

    alter table table1 
        add constraint chk_EpiNum_Valie check (EpiNum NOT LIKE '%[^a-zA-Z0-9 ]%'); 
    

    (再次为每个这样的列。)

    则数据库将保证有效性insertupdate

    +0

    戈登您好,感谢您的答复。但是,我怎么能做多个更新,并检查来自C#的结果 - 这太麻烦了。必须有一种方法来让我的SQL查询循环。系统工作的方式是从大量不同的来源和馈线系统导入原始数据,因此不允许添加约束条件。当用户试图为我们的成本系统准备他们的数据时,需要这样做。无论如何, – MoonKnight

    +0

    @Killercam。 。 。当然,你可以做一个循环(作为其他答案证明)。我的观点是:修复数据库中的数据。添加一个约束。然后你就完成了。根本没有C#代码,约束将在未来保持列的清洁。 –

    +0

    好的,谢谢。但我不能使用约束。非常感谢您的帮助。 – MoonKnight

    1

    这可能看起来有点复杂,但我解决下一个类似的挑战:

    就在这个粘贴到一个空查询窗口,并根据需要进行修改......

    --This function comes back with a running set of numbers - very handsome 
    CREATE FUNCTION [dbo].[RunningNumbers](@counter INT=1000000, @StartAt INT=0) 
    RETURNS TABLE 
    AS 
    RETURN 
        WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10^1 
        E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10^2 = 100 rows 
        E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10^4 = 10,000 rows 
        E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10^8 = 10,000,000 rows 
        CteTally AS 
        (
         SELECT TOP(ISNULL(@counter,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(@StartAt,0) As Nmbr 
         FROM E8 
        ) 
        SELECT * FROM CteTally; 
    GO 
    
    --This function breaks down a string into a one-char-table with one char in each row. 
    --You can decide for any ascii code what you want to do with this character. 
    --At the end the whole thing is concatenated again. 
    CREATE FUNCTION [dbo].[GetPrintableChars] 
    (
        @Txt VARCHAR(MAX) 
    ) 
    RETURNS VARCHAR(MAX) 
    AS 
    BEGIN 
        SET @Txt=LTRIM(RTRIM(ISNULL(@Txt,''))); 
    
        DECLARE @rslt VARCHAR(MAX); 
        SET @rslt = 
         (
          SELECT Repl.ASCII_Code 
          FROM dbo.RunningNumbers(LEN(@Txt),1) AS pos 
          --ASCII-Codes of all characters in your text 
          OUTER APPLY(SELECT ASCII(SUBSTRING(@Txt,pos.Nmbr,1)) AS ASCII_Code) AS OneChar 
          --re-code 
          CROSS APPLY 
          (
           SELECT CASE 
            WHEN OneChar.ASCII_Code IN(9,10,13) THEN CHAR(OneChar.ASCII_Code) --line and page break 
            WHEN OneChar.ASCII_Code BETWEEN 32 AND 126 THEN CHAR(OneChar.ASCII_Code) --normal printable 
            WHEN OneChar.ASCII_Code IN(132,142,148,153,174,175) THEN CHAR(OneChar.ASCII_Code) --extended to keep 
            WHEN OneChar.ASCII_Code BETWEEN 128 AND 154 THEN CHAR(176) --extended to get rid of 
            ELSE '' 
           END AS ASCII_Code 
          ) AS Repl  
          FOR XML PATH(''),TYPE 
         ).value('.','varchar(max)'); 
        RETURN @rslt; 
    END 
    GO 
    
    --One example to get rid of some characters. 
    SELECT dbo.GetPrintableChars('This is a Test for special characters: ÐðÑñ') 
    GO 
    
    --And clean up for testing 
    DROP FUNCTION dbo.GetPrintableChars; 
    GO 
    DROP FUNCTION dbo.RunningNumbers; 
    
    1

    虽然戈登·利诺夫作出约束的优点。 如果你想重用多个领域的循环代码,你可以把它放在一个函数:

    CREATE FUNCTION dbo.RemoveSpecialCharacters (
        @String NVARCHAR(max) 
    ) 
    RETURNS NVARCHAR(max) 
    BEGIN 
        DECLARE @i int 
    
        SET @i = patindex('%[^a-zA-Z0-9 ]%', @String) 
        WHILE @i > 0 
        BEGIN 
         SET @String = STUFF(@String, @i, 1, '') 
         SET @i = patindex('%[^a-zA-Z0-9 ]%', @String) 
        END 
        RETURN @String 
    END 
    

    而就重用功能:

    UPDATE [TableA] 
    SET [EpiNum] = dbo.RemoveSpecialCharacters([EpiNum]), 
        [Name] = dbo.RemoveSpecialCharacters([Name]), 
        [Acct] = dbo.RemoveSpecialCharacters([Acct]) 
    WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
         PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
         PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0; 
    

    做测试中的表现!如果你想在c#中检查结果,只需在select和update中使用该函数,如果它是正确的。

    +0

    我很喜欢这个。不过,理想情况下,我不想创建一个函数并在完成后将其删除。有没有一种清晰的方式来调整查询以避免函数调用 - 我看不到有这种情况,至少它对我来说并不明显? – MoonKnight

    +0

    @Killercam请参阅Giorgos Betsos的答案。它稍微复杂一些,但不需要功能(并且性能会更好)。虽然我没有看到问题留在函数中,因为您可能需要在某一天重用它。 –

    +0

    感谢您的时间。 – MoonKnight

    1

    创建此功能:

    CREATE function f_removebadcharacters 
    (
        @string varchar(2000) 
    ) 
    RETURNS varchar(2000) 
    as 
    BEGIN 
        DECLARE @badcharacters varchar(100) = '%[^A-Z0-9 ]%' 
    
        WHILE @string like @badcharacters 
        SET @string = STUFF(@string, patindex(@badcharacters, @string), 1, '') 
    
        RETURN @string 
    END 
    

    调用该函数是这样的:

    SELECT dbo.f_removebadcharacters('Salaries & Wages - Other') 
    

    在您的更新,使用此语法:

    UPDATE [TableA] 
    SET [EpiNum] = dbo.f_removebadcharacters([EpiNum]) 
    WHERE [EpiNum] LIKE '%[^A-Z0-9 ]%' 
    

    在这里,我是个工作示例:

    DECLARE @TableA table([EpiNum] varchar(2000)) 
    INSERT @TableA 
        values('Salaries & Wages - Other'), 
         ('80-82/5 O$%*#@)(J^#[email protected]!n & '''' Bacon St') 
    
    
    UPDATE @TableA 
    SET [EpiNum] = dbo.f_removebadcharacters([EpiNum]) 
    WHERE [EpiNum] LIKE '%[^A-Z0-9 ]%' 
    
    SELECT * FROM @TableA 
    

    结果:

    EpiNum 
    Salaries Wages Other 
    80825 OJohn Bacon St 
    
    0

    方式应用的更新乘法时间和控制结果

    declare @l int; 
    select @l= COUNT(*) from sys.views --just to set @@ROWCOUNT to 1 
    
    while @@ROWCOUNT >0 
    begin 
        UPDATE [TableA] 
        SET [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]), 1), ''), 
        [Name] = REPLACE([Name], SUBSTRING([Name], PATINDEX('%[^a-zA-Z0-9 ]%', [Name]), 1), ''), 
        [Acct] = REPLACE([Acct], SUBSTRING([Acct], PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]), 1), '') 
        WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
         PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
         PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0; 
    end