2010-09-15 167 views
12

我有一个很大的nvarchar,我希望传递给HashBytes函数。 我得到的错误:SQL Server 2008和HashBytes

"String or binary would be truncated. Cannot insert the value NULL into column 'colname', tbale 'table'; column does not allow nulls. UPDATE fails. The statement has been terminated."

曾经作为足智多谋,我发现这是由于具有8000个字节的最大限制HASHBYTES功能。进一步搜索给我一个“解决方案”在我的大VARCHAR将在晚些时候分割seperately散列,然后与该用户定义的函数:

function [dbo].[udfLargeHashTable] (@algorithm nvarchar(4), @InputDataString varchar(MAX)) 
RETURNS varbinary(MAX) 
AS 
BEGIN 
DECLARE 
    @Index int, 
    @InputDataLength int, 
    @ReturnSum varbinary(max), 
    @InputData varbinary(max) 

SET @ReturnSum = 0 
SET @Index = 1 
SET @InputData = convert(binary,@InputDataString) 
SET @InputDataLength = DATALENGTH(@InputData) 

WHILE @Index <= @InputDataLength 
BEGIN 
    SET @ReturnSum = @ReturnSum + HASHBYTES(@algorithm, SUBSTRING(@InputData, @Index, 8000)) 
    SET @Index = @Index + 8000 
END 
RETURN @ReturnSum 
END 

我与拨打:

set @ReportDefinitionHash=convert(int,dbo.[udfLargeHashTable]('SHA1',@ReportDefinitionForLookup)) 

凡@ReportDefinitionHash是int,而@ReportDefinitionForLookup是varchar

传递一个简单的字符'test'会产生一个与我的UDF不同的int,而不是对HashBytes产生的正常调用。

对此问题有何建议?

+0

基本上,你不想聚合你的哈希字符串,所以返回类型应该是varbinary(20)。然后,尝试运行以下命令:'select hashbytes('sha1','test'),hashbytes('sha1',N'test')'(您非常惊喜):) – 2010-09-15 14:52:04

回答

9

只要使用此功能(从Hashing large data strings with a User Defined Function拍摄):

create function dbo.fn_hashbytesMAX 
    (@string nvarchar(max) 
    , @Algo varchar(10) 
    ) 
    returns varbinary(20) 
as 
/************************************************************ 
* 
* Author:  Brandon Galderisi 
* Last modified: 15-SEP-2009 (by Denis) 
* Purpose:  uses the system function hashbytes as well 
*     as sys.fn_varbintohexstr to split an 
*     nvarchar(max) string and hash in 8000 byte 
*     chunks hashing each 8000 byte chunk,, 
*     getting the 40 byte output, streaming each 
*     40 byte output into a string then hashing 
*     that string. 
* 
*************************************************************/ 
begin 
    declare @concat  nvarchar(max) 
       ,@NumHash  int 
       ,@HASH   varbinary(20) 
    set @NumHash = ceiling((datalength(@string)/2)/(4000.0)) 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    if @NumHash>1 
    begin 
                 -- # * 4000 character strings 
      ;with a as (select 1 as n union all select 1) -- 2 
       ,b as (select 1 as n from a ,a a1)  -- 4 
       ,c as (select 1 as n from b ,b b1)  -- 16 
       ,d as (select 1 as n from c ,c c1)  -- 256 
       ,e as (select 1 as n from d ,d d1)  -- 65,536 
       ,f as (select 1 as n from e ,e e1)  -- 4,294,967,296 = 17+ TRILLION characters 
       ,factored as (select row_number() over (order by n) rn from f) 
       ,factors as (select rn,(rn*4000)+1 factor from factored) 

      select @concat = cast((
      select right(sys.fn_varbintohexstr 
         (
         hashbytes(@Algo, substring(@string, factor - 4000, 4000)) 
         ) 
         , 40) + '' 
      from Factors 
      where rn <= @NumHash 
      for xml path('') 
     ) as nvarchar(max)) 


      set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo) 
    end 
    else 
    begin 
      set @HASH = convert(varbinary(20), hashbytes(@Algo, @string)) 
    end 

return @HASH 
end 

而且结果如下:

select 
hashbytes('sha1', N'test') --native function with nvarchar input 
,hashbytes('sha1', 'test') --native function with varchar input 
,dbo.fn_hashbytesMAX('test', 'sha1') --Galderisi's function which casts to nvarchar input 
,dbo.fnGetHash('sha1', 'test') --your function 

输出:

0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1 
0xA94A8FE5CCB19BA61C4C0873D391E987982FBBD3 
0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1 
0x00000000AE6DBA4E0F767D06A97038B0C24ED720662ED9F1 
+0

我觉得有这里有一个bug。调用具有较大值的'dbo.fn_hashbytesMAX()'会产生相同的散列值。在我看来,'@ string'参数类型需要是'nvarchar(max)'而不是'varchar(max)',否则将'datalength()'结果减半是没有意义的。实际上,'datalength(@string)/ 2'意味着它只散列一半的子串。 – Rory 2013-09-11 12:01:29

+0

我最初看到提供的函数是用于'nvarchar(max)'输入的并且被改变了。任何使用它的人都应该将'@ string'数据类型更改为'nvarchar(max)'或更改代码以正常工作(这可能意味着将其他nvarchar更改为varchar并删除'/ 2',但是您想要测试) – Rory 2013-09-11 12:58:57

+0

我按照以前的评论编辑了答案 - 现在使用nvarchar进行计算。如果传递一个varchar值,因为参数首先被转换为nvarchar,将不会输出与hashbytes()相同的值。更改为返回varbinary,所以使用md5算法调用返回正确的长度。 – Rory 2013-09-21 13:51:40

1

你可以写一个SQL CLR功能:

[Microsoft.SqlServer.Server.SqlFunction] 
public static SqlBinary BigHashBytes(SqlString algorithm, SqlString data) 
{ 
    var algo = HashAlgorithm.Create(algorithm.Value); 

    var bytes = Encoding.UTF8.GetBytes(data.Value); 

    return new SqlBinary(algo.ComputeHash(bytes)); 
} 

然后它可以在SQL这样调用:

--these return the same value 
select HASHBYTES('md5', 'test stuff') 
select dbo.BigHashBytes('md5', 'test stuff') 

BigHashBytes是唯一必要的,如果长度将超过8K。

+0

1)SQL Server中的字符串数据以UTF-16 Little Endian存储,相当于.NET中的“U​​nicode”。 2)由于SqlString可以通过[SqlString.GetUnicodeBytes](https://msdn.microsoft.com/en-us/library)为您提供Unicode字节[],因此您不必烦恼'Encoding。 /system.data.sqltypes.sqlstring.getunicodebytes.aspx)。 – 2015-05-29 17:48:19

14

如果您不能创建一个功能,必须使用已经存在于数据库的东西:

sys.fn_repl_hash_binary(cast('some really long string' as varbinary(max))) 

来自

sys.fn_repl_hash_binary 

可以由使用语法工作: http://www.sqlnotes.info/2012/01/16/generate-md5-value-from-big-data/

+0

注意:仅适用于SQL Server 2008以上版本 – Rory 2013-09-11 12:07:21

+0

如果您有utf-8数据,则不起作用 - “NVARCHAR”字符串 – gotqn 2014-04-23 08:37:45

+1

SQL Server不使用utf-8字符串。我对NVARCHAR字符串没有问题。 – 2014-10-06 18:48:58

0

这可被用作功能体,也:

DECLARE @A NVARCHAR(MAX) = N'test' 

DECLARE @res VARBINARY(MAX) = 0x 
DECLARE @position INT = 1 
     ,@len INT = DATALENGTH(@A) 

WHILE 1 = 1 
BEGIN 
    SET @res = @res + HASHBYTES('SHA2_256', SUBSTRING(@A, @position, 4000)) 
    SET @position = @position+4000 
    IF @Position > @len 
     BREAK 
END 

SELECT HASHBYTES('SHA2_256',@res) 

思想si到HASH每个4000部分NVARCHAR(MAX)字符串和concatanate结果。然后到HASH后一个结果。

1

测试工作 选择master.sys.fn_repl_hash_binary(someVarbinaryMaxValue) 而且并不复杂:)

0

看来最简单的方法是编写解析输入文本值到子varchar(8000)段递归哈希算法。 我任意选择输入字符串切成7500个字符段 散列算法返回varbinary(20),其可容易地转化成varchar(20)

ALTER FUNCTION [dbo].[BigHash] 
( 
    @TextValue nvarchar(max) 
) 

RETURNS varbinary(20) 

AS 
BEGIN 

    if @TextValue = null 
     return hashbytes('SHA1', 'null') 


    Declare @FirstPart as varchar(7500) 
    Declare @Remainder as varchar(max) 

    Declare @RemainderHash as varbinary(20) 
    Declare @BinaryValue as varbinary(20) 

    Declare @TextLength as integer 


    Set @TextLength = len(@TextValue) 

    if @TextLength > 7500 
     Begin 
      Set @FirstPart = substring(@TextValue, 1, 7500)   

      Set @Remainder = substring(@TextValue, 7501, @TextLength - 7500)   

      Set @RemainderHash = dbo.BigHash(@Remainder) 

      Set @BinaryValue = hashbytes('SHA1', @FirstPart + convert(varchar(20), @RemainderHash, 2)) 

      return @BinaryValue 

     End 
    else 
     Begin 
      Set @FirstPart = substring(@TextValue, 1, @TextLength)      
      Set @BinaryValue = hashbytes('SHA1', @FirstPart) 

      return @BinaryValue 
     End 


    return null 

END 
6

我已经采取接受的答案,并与修改后的有点以下改进:

  1. 不再递归函数
  2. 现在绑定到架构
  3. 不再依靠无证ST已编程的程序
  4. 两个版本:一个用于nvarchar,一个用于varchar
  5. 返回与HASHBYTES相同的数据大小,由最终用户根据所用算法将其转换为较小值。这使得这些功能可以支持未来的算法和更大的数据返回。

随着这些变化,的功能,现在可以在持久性计算列作为创建时它们现在标记确定性被使用。

CREATE FUNCTION dbo.fnHashBytesNVARCHARMAX 
(
    @Algorithm VARCHAR(10), 
    @Text NVARCHAR(MAX) 
) 
RETURNS VARBINARY(8000) 
WITH SCHEMABINDING 
AS 
BEGIN 
    DECLARE @NumHash INT; 
    DECLARE @HASH VARBINARY(8000); 
    SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    WHILE @NumHash > 1 
    BEGIN 
     -- # * 4000 character strings 
     WITH a AS 
     (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
     b AS 
     (SELECT 1 AS n FROM a, a a1),  -- 4 
     c AS 
     (SELECT 1 AS n FROM b, b b1),  -- 16 
     d AS 
     (SELECT 1 AS n FROM c, c c1),  -- 256 
     e AS 
     (SELECT 1 AS n FROM d, d d1),  -- 65,536 
     f AS 
     (SELECT 1 AS n FROM e, e e1),  -- 4,294,967,296 = 17+ TRILLION characters 
     factored AS 
     (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f), 
     factors AS 
     (SELECT rn, (rn * 4000) + 1 factor FROM factored) 
     SELECT @Text = CAST 
      (
       (
        SELECT CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 4000, 4000)), 1) 
        FROM factors 
        WHERE rn <= @NumHash 
        FOR XML PATH('') 
       ) AS NVARCHAR(MAX) 
      ); 

     SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    END; 
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text)); 
    RETURN @HASH; 
END; 

CREATE FUNCTION dbo.fnHashBytesVARCHARMAX 
(
    @Algorithm VARCHAR(10), 
    @Text VARCHAR(MAX) 
) 
RETURNS VARBINARY(8000) 
WITH SCHEMABINDING 
AS 
BEGIN 
    DECLARE @NumHash INT; 
    DECLARE @HASH VARBINARY(8000); 
    SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    WHILE @NumHash > 1 
    BEGIN 
     -- # * 4000 character strings 
     WITH a AS 
     (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
     b AS 
     (SELECT 1 AS n FROM a, a a1),  -- 4 
     c AS 
     (SELECT 1 AS n FROM b, b b1),  -- 16 
     d AS 
     (SELECT 1 AS n FROM c, c c1),  -- 256 
     e AS 
     (SELECT 1 AS n FROM d, d d1),  -- 65,536 
     f AS 
     (SELECT 1 AS n FROM e, e e1),  -- 4,294,967,296 = 17+ TRILLION characters 
     factored AS 
     (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f), 
     factors AS 
     (SELECT rn, (rn * 8000) + 1 factor FROM factored) 
     SELECT @Text = CAST 
     (
      (
       SELECT CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 8000, 8000)), 1) 
       FROM factors 
       WHERE rn <= @NumHash 
       FOR XML PATH('') 
      ) AS NVARCHAR(MAX) 
     ); 

     SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    END; 
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text)); 
    RETURN @HASH; 
END;