为什么我的upperRange2列偶尔会返回NULL。我滥用CTE吗？

-1

我有一套，我要执行两个单独的随机抽奖拉。我想在没有循环声明的情况下做这个抽奖，并认为CTE工作得很好。因为他们总是为第一次随机选择工作。但是，当我尝试链接相同逻辑的第二次重复时，我开始从数据库中获取随机行为。具体来说，我的upperRange2字段有时是NULL。运行下面的代码几次，你应该看到，有时upperRange2有适当的值，有时它只是NULL。如果您只从candidateSelected CTE中选择，您将看到upperRange字段始终有效。这个问题只发生在我为两个不同的随机选择尝试遵循这个模式两次。为什么我的upperRange2列偶尔会返回NULL。我滥用CTE吗？

CREATE VIEW [dbo].[RandomNumberView] 
AS 
    SELECT RAND() AS randomNumber 
GO 

CREATE FUNCTION [dbo].[GetRandoms] (
    @lowerRange int, 
    @upperRange int, 
    @count int 
) 
RETURNS @randoms TABLE (
    [randomNumber] int primary key 
) 
AS 
BEGIN 
    IF (@lowerRange IS NOT NULL 
     AND @upperRange IS NOT NULL 
     AND @count IS NOT NULL 
    ) 
    BEGIN 
     DECLARE @candidateCount int, 
       @random float, 
       @selected int 

     /* This is the numbers that are possible to be selected 
     ** from the user specified range. */ 
     SELECT @candidateCount = (@upperRange - @lowerRange) + 1 

     /* If the user specified a count that is greater than the 
     ** candidate count, then return every possibility between 
     ** the lower and upper range even though its less than 
     ** the count requested. */ 
     IF (@count > @candidateCount) 
      INSERT @randoms 
       SELECT i 
       FROM Seq(@lowerRange, @upperRange) 

     /* So that we don't select duplicate numbers keep grabbing 
     ** a unique random number until the user specified count 
     ** has been reached from the range specified. */ 
     WHILE (@count <= @candidateCount AND (SELECT COUNT(*) FROM @randoms) < @count) 
     BEGIN 
      /* Note the use of the RandomNumberView. It is forbidden 
      ** to use non-deterministic functions in functions, which 
      ** is why there isn't a call to RAND() here instead. The 
      ** RandomNumberView is just a boxing mechanism around the 
      ** RAND() function so that it turns it into a table type 
      ** source instead of a function and is therefore allowed. */ 
      SELECT @random = randomNumber FROM RandomNumberView 

      /* To understand how the percentile random number is reduced 
      ** to the range specified by the user consider this statement 
      ** that produces a range of 0 to 6: ROUND(RAND() * 6, 0) */ 
      SELECT @selected = ROUND(@random * (@candidateCount - 1), 0) + @lowerRange 
      IF (NOT EXISTS (SELECT * FROM @randoms WHERE randomNumber = @selected)) 
       INSERT @randoms VALUES (@selected) 
     END 
    END 

    RETURN 
END 
GO 

declare @candidates table (name varchar(10) primary key, [weight] int not null, secondaryWeight int not null); 
insert @candidates values ('Carl', 2, 1); 
insert @candidates values ('James', 1, 2); 
insert @candidates values ('Randy', 3, 1); 
insert @candidates values ('David', 2, 2); 
insert @candidates values ('Michael', 1, 1); 

declare @pickCount int = 2; 

with 
    candidateRows as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      row_number() over (order by [weight]) as [row] 
     from @candidates 
    ), 
    candidateLowerRanges as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      [row], 
      (
       select sum([weight]) 
       from candidateRows b 
       where b.[row] <= a.[row] 
      ) as upperRange 
     from candidateRows a 
    ), 
    candidateFullRanges as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      [row], 
      upperRange, 
      lag(upperRange, 1, 0) over (order by upperRange) as previousUpperRange 
     from candidateLowerRanges 
    ), 
    candidatesSelected as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      [row], 
      upperRange, 
      previousUpperRange, 
      randomNumber 
     from candidateFullRanges s 
      inner join GetRandoms(1, (select max(upperRange) from candidateLowerRanges), @pickCount) r 
       on s.upperRange >= r.randomNumber 
        and s.previousUpperRange < r.randomNumber    
    ), 
    secondRows as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      [row], 
      upperRange, 
      previousUpperRange, 
      randomNumber, 
      row_number() over (order by secondaryWeight desc) as [row2] 
     from candidatesSelected 
    ), 
    secondUpperRanges as (
     select 
      name, 
      [weight], 
      secondaryWeight, 
      [row], 
      upperRange, 
      previousUpperRange, 
      randomNumber, 
      row2, 
      (
       select sum(secondaryWeight) 
       from secondRows b 
       where b.[row] <= a.[row] 
      ) as upperRange2 
     from secondRows a 
    ) 
select * 
from secondUpperRanges

来源

2015-11-20 Rob

“禁止在函数中使用非确定性函数”？你知道为什么是这样吗？ SQL Server正在这样做强制执行函数的确定性，并且**可能无法第二次用相同的参数调用用户定义的函数**，从而有效地重用'rand（）'中的先前值。 –

@ShannonSeverance，我没有意识到它总是把所有的功能当作确定性的。由于我没有看到任何方式来标记它，所以我认为引擎正在分析它，以确定它是否有任何非确定性函数在其中使用。感谢您指点我可能回到真正的问题，因为我误以为它有两次按顺序使用它。 – Rob

此外，我会标记你的答案是正确的，但它的评论。我刚进去并取消了GetRandoms函数的使用，并且行为结束了。 – Rob

这个问题是“副作用”而不是决定论或缺乏它。（我的评论今天上午是不完全正确的细节。）

"User-defined functions cannot be used to perform actions that modify the database state."

为什么？我不确定。我认为在阅读过程中调用函数可能会改变将要返回的结果的事实是它的核心。取决于SQL Server读取行的顺序，您可能会得到不同的结果。人们不需要知道查询中的工作顺序。像这样的东西。

RAND(.5)是确定性的，但仍不能在函数中使用。

"RAND is deterministic only when a seed parameter is specified."

CREATE FUNCTION dbo.f() RETURNS FLOAT AS BEGIN 
    RETURN RAND(.5) 
END

消息443，级别16，状态1，方法f，行的函数内3
无效使用了副作用的运算符 '兰特' 的。

那么RAND()有什么副作用？

伪随机数发生器通常具有内部状态。当从发生器获得一个数字时，会发生两件事情。内部状态被更新并返回一个“随机”数字。数量和新状态严格取决于种子（如果提供的话）或状态与功能开始时的状态。该内部状态在SQL Server中的某个级别是全局的。所以RAND()总是改变状态，但提供种子时是确定性的。

非确定性函数可用于用户定义函数。

GETDATE()是nodetermistic，但不是“副作用”。

CREATE FUNCTION dbo.f() RETURNS datetime AS BEGIN 
    RETURN GETDATE() 
END 
GO 
PRINT dbo.f()

2015年11月20日下午1时27

拳击视图里面RAND()通话藏RAND()的侧面影响的性质，它是否隐藏RAND()的非确定性？

No.我无法找到系统视图来查询以确定SQL Server是否认为某个函数是确定性的，但我们可以尝试使用该函数，并观察错误。

CREATE VIEW [dbo].[RandomNumberView] 
WITH SCHEMABINDING 
AS 
    SELECT RAND() AS randomNumber 
GO 

CREATE FUNCTION dbo.MyRand() RETURNS FLOAT 
WITH SCHEMABINDING AS BEGIN 
    DECLARE @Result FLOAT 
    SELECT @Result = randomNumber 
    FROM dbo.RandomNumberView 
    RETURN @Result 
END 
GO 

CREATE TABLE dbo.T (Col int 
    , ColC AS dbo.MyRand() PERSISTED)

消息4936，级别16，状态1，行2
表 'T' 计算列 'COLC' 不能持久，因为该列是非确定性的。

所以dbo.MyRand()是非确定性的，为什么它不再工作？

"One caveat of almost all nondeterministic functions is that they are executed once per statement, not once per row. ... The only exception to this rule is NEWID, which will generate a new GUID for every row in the statement."

确定性和非确定性被用来确定一个值是否可以安全地持续。如果它在计算列或视图中使用，则不能坚持GETDATE()，因为明天的值与今天的值不同。 dbo.MyRand()和rand()的值将在接下来的SELECT上有所不同，因此它必须重新计算，而不是从持久源中提取。

但是，SQL Server的编写者已经决定在单个语句中，如果需要一个或多个值，它们将只保证至少有一次调用。在使用GETDATE()时，单个语句看起来好像是同时发生的，这很好。在单个语句中尝试使用随机性时，这种“特征”是一种痛苦。

注：这听起来像PRNG的随机性是你的应用程序非常重要。评估随机性超出了我的肯定。还有其他方法可以在SQL Server查询中的行级别获得随机性，其中一些涉及使用每行评估的NEWID。但是，如果随机性很重要，那么我不会使用它们。见Random Sampling in T-SQL。

来源

2015-11-20 22:36:38

为什么我的upperRange2列偶尔会返回NULL。我滥用CTE吗？

回答

相关问题