SSIS只添加已更改的行

我有一个项目，其中包括将所有用户（包括其所有属性）从Active Directory域导入到SQL Server表。此表将由Reporting Services应用程序使用。SSIS只添加已更改的行

表模型具有以下的列： -ID：（即自动生成的唯一标识符）。 -distinguishedName：包含用户的LDAP专有名称属性。 -attribute_name：包含用户属性的名称。 -attribute_value：包含属性值。 -timestamp：包含自动生成的日期时间值。

我已经创建了一个脚本任务的SSIS包，其中包含一个C＃代码，可将所有数据导出到稍后由数据流任务导入到表中的.CSV。该项目没有任何问题，但生成了超过200万行（AD域有大约30,000个用户，每个用户有100-200个属性）。

SSIS包应该每天运行，并且只有当新的用户属性或属性值更改时才导入数据。

为了做到这一点，我创建了一个数据流，将整个表复制到一个记录集中。

此记录被转换成一个数据表，并在脚本组件步骤，如果在所述数据表中存在的当前行，其verfies使用。如果该行存在，则比较属性值，并仅当值不同时或在数据表中找不到该行时才将行返回给输出。这是代码：

块引用

public override void Input0_ProcessInputRow(Input0Buffer Row) 
{ 
    bool processRow = compareValues(Row); 

    if (processRow) 
    { 
     //Direct to output 0 
     Row.OutdistinguishedName = Row.distinguishedName.ToString(); 
     Row.Outattributename = Row.AttributeName.ToString(); 
     Row.Outattributevalue.AddBlobData(System.Text.Encoding.UTF8.GetBytes(Row.AttributeValue.ToString())); 
    } 
} 

public bool compareValues(Input0Buffer Row) 
{ 
    //Variable declaration 
    DataTable dtHostsTbl = (DataTable)Variables.dataTableTbl; 
    string expression = "", distinguishedName = Row.distinguishedName.ToString(), attribute_name = Row.AttributeName.ToString(), attribute_value = Row.AttributeValue.ToString(); 
    DataRow[] foundRowsHost = null; 

    //Query datatable 
    expression = "distinguishedName LIKE '" + distinguishedName + "' AND attribute_name LIKE '" + attribute_name + "'"; 
    foundRowsHost = dtHostsTbl.Select(expression); 

    //Process found row 
    if (foundRowsHost.Length > 0) 
    { 
     //Get the host id 
     if (!foundRowsHost[0][2].ToString().Equals(attribute_value)) 
     { 
      return true; 
     } 
     else 
     { 
      return false; 
     } 
    } 
    else 
    { 
     return true; 
    } 
}

的代码工作，但它是极其缓慢。有没有更好的方法来做到这一点？

来源

2015-11-20 Sergio

这里有一些想法：

选项A. （实际上是一个期权组合）

使用whenChanged属性查询Active Directory时消除不必要的数据。仅此一项就会显着减少记录数量。如果通过whenChanged进行筛选是不可能的，或者除此之外，请考虑以下步骤。
而不是将所有现有记录导入Recordset Destination - 将它们导入Cache Transform。然后在2 Lookup组件的缓存连接管理器中使用此缓存转换。一个查找组件验证{distinguishedName,attribute_name}组合是否存在。（这会插入）另一个查找组件验证{distinguishedName,attribute_name,attribute_value}组合是否存在（这将是更新或删除/插入）。这对查找应替代您的Skip rows which are in the table脚本组件。
评估是否可以减小色谱柱尺寸：attribute_name和attribute_value。特别是nvarchar(max)经常破坏派对。
如果无法缩小attribute_name和attribute_value的大小 - 请考虑存储它们的散列值并验证散列值是否发生了变化，而不是验证值本身。
删除CSV步骤 - 仅将数据从当前填充CSV的初始源传输到一个数据流中的查找以及查找中找不到的数据 - 传输到您的OLE DB Destination组件。

选项B.

检查源，其从Active Directory中读取，速度快本身。（只需单独运行数据流，没有任何目的地来衡量其性能）。如果您对其性能表示满意，并且如果您不反对删除ad_User表中的所有内容 - 只需每天删除并重新填充这两百万个表。从AD读取所有内容并将其写入SQL Server中，在同一数据流中，没有任何更改检测，实际上可能是最简单和最快的选项。

来源

2015-11-21 03:19:49 helix

感谢您的建议，螺旋。我发现了一个更简单的方法来做到这一点，我刚刚导入新的AD出口到另一个表和使用EXCEPT命令： SELECT的distinguishedName，属性名称，ATTRIBUTE_VALUE FROM dbo.ad_User EXCEPT SELECT的distinguishedName，属性名称，ATTRIBUTE_VALUE FROM dbo.ad_User_Old 该命令只需要10秒。 – Sergio

SSIS只添加已更改的行

回答

相关问题