2009-07-20 44 views
2

我的程序现在仍在运行,可以将日志文件中的数据导入远程SQL Server数据库。日志文件的大小约为80MB,包含约470000行,约25000行数据。我的程序只能输入300行/秒,这真的很糟糕。 :(从文本文件导入到SQL Server数据库,是ADO.NET太慢了吗?

public static int ImportData(string strPath) 
{ 
    //NameValueCollection collection = ConfigurationManager.AppSettings; 

    using (TextReader sr = new StreamReader(strPath)) 
    { 
     sr.ReadLine(); //ignore three first lines of log file 
     sr.ReadLine(); 
     sr.ReadLine(); 
     string strLine; 
     var cn = new SqlConnection(ConnectionString); 
     cn.Open(); 

     while ((strLine = sr.ReadLine()) != null) 
     { 
      { 
       if (strLine.Trim() != "") //if not a blank line, then import into database 
       { 
        InsertData(strLine, cn); 
        _count++; 
       } 
      } 
     } 
     cn.Close(); 
     sr.Close(); 

     return _count; 
    } 
} 

InsertData只是使用ADO.NET正常插入方法它采用解析法:

public Data(string strLine) 
{ 
    string[] list = strLine.Split(new[] {'\t'}); 
    try 
    { 
     Senttime = DateTime.Parse(list[0] + " " + list[1]); 
    } 
    catch (Exception) 
    { 
    } 

    Clientip = list[2]; 
    Clienthostname = list[3]; 

    Partnername = list[4]; 
    Serverhostname = list[5]; 
    Serverip = list[6]; 

    Recipientaddress = list[7]; 
    Eventid = Convert.ToInt16(list[8]); 
    Msgid = list[9]; 
    Priority = Convert.ToInt16(list[10]); 
    Recipientreportstatus = Convert.ToByte(list[11]); 
    Totalbytes = Convert.ToInt32(list[12]); 
    Numberrecipient = Convert.ToInt16(list[13]); 
    DateTime temp; 
    if (DateTime.TryParse(list[14], out temp)) 
    { 
     OriginationTime = temp; 
    } 
    else 
    { 
     OriginationTime = null; 
    } 
    Encryption = list[15]; 
    ServiceVersion = list[16]; 
    LinkedMsgid = list[17]; 
    MessageSubject = list[18]; 
    SenderAddress = list[19]; 
} 

InsertData方法:

private static void InsertData(string strLine, SqlConnection cn) 
{ 
    var dt = new Data(strLine); //parse the log line into proper fields 
    const string cnnStr = 
     "INSERT INTO LOGDATA ([SentTime]," + "[client-ip]," + 
     "[Client-hostname]," + "[Partner-Name]," + "[Server-hostname]," + 
     "[server-IP]," + "[Recipient-Address]," + "[Event-ID]," + "[MSGID]," + 
     "[Priority]," + "[Recipient-Report-Status]," + "[total-bytes]," + 
     "[Number-Recipients]," + "[Origination-Time]," + "[Encryption]," + 
     "[service-Version]," + "[Linked-MSGID]," + "[Message-Subject]," + 
     "[Sender-Address]) " + " VALUES ( " + "@Senttime," + "@Clientip," + 
     "@Clienthostname," + "@Partnername," + "@Serverhostname," + "@Serverip," + 
     "@Recipientaddress," + "@Eventid," + "@Msgid," + "@Priority," + 
     "@Recipientreportstatus," + "@Totalbytes," + "@Numberrecipient," + 
     "@OriginationTime," + "@Encryption," + "@ServiceVersion," + 
     "@LinkedMsgid," + "@MessageSubject," + "@SenderAddress)"; 


    var cmd = new SqlCommand(cnnStr, cn) {CommandType = CommandType.Text}; 

    cmd.Parameters.AddWithValue("@Senttime", dt.Senttime); 
    cmd.Parameters.AddWithValue("@Clientip", dt.Clientip); 
    cmd.Parameters.AddWithValue("@Clienthostname", dt.Clienthostname); 
    cmd.Parameters.AddWithValue("@Partnername", dt.Partnername); 
    cmd.Parameters.AddWithValue("@Serverhostname", dt.Serverhostname); 
    cmd.Parameters.AddWithValue("@Serverip", dt.Serverip); 
    cmd.Parameters.AddWithValue("@Recipientaddress", dt.Recipientaddress); 
    cmd.Parameters.AddWithValue("@Eventid", dt.Eventid); 
    cmd.Parameters.AddWithValue("@Msgid", dt.Msgid); 
    cmd.Parameters.AddWithValue("@Priority", dt.Priority); 
    cmd.Parameters.AddWithValue("@Recipientreportstatus", dt.Recipientreportstatus); 
    cmd.Parameters.AddWithValue("@Totalbytes", dt.Totalbytes); 
    cmd.Parameters.AddWithValue("@Numberrecipient", dt.Numberrecipient); 
    if (dt.OriginationTime != null) 
     cmd.Parameters.AddWithValue("@OriginationTime", dt.OriginationTime); 
    else 
     cmd.Parameters.AddWithValue("@OriginationTime", DBNull.Value); 
      //if OriginationTime was null, then insert with null value to this column 
    cmd.Parameters.AddWithValue("@Encryption", dt.Encryption); 
    cmd.Parameters.AddWithValue("@ServiceVersion", dt.ServiceVersion); 
    cmd.Parameters.AddWithValue("@LinkedMsgid", dt.LinkedMsgid); 
    cmd.Parameters.AddWithValue("@MessageSubject", dt.MessageSubject); 
    cmd.Parameters.AddWithValue("@SenderAddress", dt.SenderAddress); 
    cmd.ExecuteNonQuery(); 
} 

哪有我的程序运行速度更快? 非常感谢!

回答

13

使用SqlBulkCopy

编辑:我创建了最小的实现IDataReader并创建了Batch类型,以便我可以使用SqlBulkCopy插入任意内存数据。这里是重要的一点:

IDataReader dr = batch.GetDataReader(); 
using (SqlTransaction tx = _connection.BeginTransaction()) 
{ 
    try 
    { 
     using (SqlBulkCopy sqlBulkCopy = 
      new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx)) 
     { 
      sqlBulkCopy.DestinationTableName = TableName; 
      SetColumnMappings(sqlBulkCopy.ColumnMappings); 
      sqlBulkCopy.WriteToServer(dr); 
      tx.Commit(); 
     } 
    } 
    catch 
    { 
     tx.Rollback(); 
     throw; 
    } 
} 

实施的其余部分留作练习读者:)

提示:您需要实现的IDataReader唯一位ReadGetValueFieldCount

+2

SQLBulkCopy是要走的路。我曾经在SQL 6.5/7.0天后使用bcp从CSV导入数据,发现它的速度非常快。 SqlBulkCopy本质上是暴露给托管代码的相同功能。 – davewasthere 2009-07-20 09:03:08

+0

我的日志文件包含三行三行的头文件,并且需要2个字段来表示日期时间。我必须将它们合并以转换为Datetime值。我怎样才能做到这一点? – Vimvq1987 2009-07-20 09:46:25

4

嗯,让我们稍微分解一下。

伪代码,你做了什么,是FF:

  1. 打开文件
    • 打开具有数据对于每一个线路的连接
    • 解析字符串
    • 保存SQL Server中的数据
    • 关闭连接
    • 关闭该文件在做这种方式

现在的根本问题是:

  • 你保持一个SQL连接打开,等待您的线路解析(很容易超时和东西)
  • 可能是逐行保存数据,每一行都在自己的事务中。我们不会知道,直到你告诉我们什么InsertData方法是做
  • 因此您把文件打开,等待SQL,完成插入

这样做的最佳方式是解析文件一个整体,然后将它们批量插入。您可以使用SqlBulkCopy(如Matt Howells所建议的)或SQL Server Integration Services执行此操作。

如果您想坚持使用ADO.NET,可以将INSERT语句集中在一起,然后将它们传递到一个大的SQLCommand中,而不是通过这种方式进行,例如,为每个插入语句设置一个SQLCommand对象。

2

您为每一行数据创建SqlCommand对象。因此,最简单的改进是创建一个

private static SqlCommand cmdInsert 

并用Parameters.Add()方法声明参数。然后,对于每个数据行,使用

cmdInsert.Parameters["@paramXXX"].Value = valueXXX; 

第二性能改善可能是跳过创建数据对象的每一行,并直接从列表[]数组指派的参数值设置的参数值。

相关问题