2013-07-10 91 views
21

我想将Excel文件读入Data.DataTable列表中,但使用当前方法可能需要很长时间。我实际上按工作表逐个转到Worksheet,并且它往往需要很长时间。有没有更快的方法来做到这一点?这里是我的代码:快速将Excel导入DataTable

List<DataTable> List = new List<DataTable>(); 

    // Counting sheets 
    for (int count = 1; count < WB.Worksheets.Count; ++count) 
    { 
     // Create a new DataTable for every Worksheet 
     DATA.DataTable DT = new DataTable(); 

     WS = (EXCEL.Worksheet)WB.Worksheets.get_Item(count); 

     textBox1.Text = count.ToString(); 

     // Get range of the worksheet 
     Range = WS.UsedRange; 


     // Create new Column in DataTable 
     for (cCnt = 1; cCnt <= Range.Columns.Count; cCnt++) 
     { 
      textBox3.Text = cCnt.ToString(); 


       Column = new DataColumn(); 
       Column.DataType = System.Type.GetType("System.String"); 
       Column.ColumnName = cCnt.ToString(); 
       DT.Columns.Add(Column); 

      // Create row for Data Table 
      for (rCnt = 0; rCnt <= Range.Rows.Count; rCnt++) 
      { 
       textBox2.Text = rCnt.ToString(); 

       try 
       { 
        cellVal = (string)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2; 
       } 
       catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException) 
       { 
        ConvertVal = (double)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2; 
        cellVal = ConvertVal.ToString(); 
       } 

       // Add to the DataTable 
       if (cCnt == 1) 
       { 

        Row = DT.NewRow(); 
        Row[cCnt.ToString()] = cellVal; 
        DT.Rows.Add(Row); 
       } 
       else 
       { 

        Row = DT.Rows[rCnt]; 
        Row[cCnt.ToString()] = cellVal; 

       } 
      } 
     } 
     // Add DT to the list. Then go to the next sheet in the Excel Workbook 
     List.Add(DT); 
    } 
+0

遗憾的是没有。 – gustavodidomenico

+0

“有没有更快的方法来做到这一点?不幸的是没有。”绝对垃圾。此代码正在为读取的每个Excel单元格值创建(并错误地无法处理)COM对象。这是实现它的最慢的方法!将整个工作表一次读入一个数组,然后迭代该数组中的项目会更快。 –

回答

12

Caling .Value2是昂贵的操作,因为它是一个COM互操作调用。我反而通过阵列读取整个范围到一个数组,然后循环:

object[,] data = Range.Value2; 

// Create new Column in DataTable 
for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt++) 
{ 
    textBox3.Text = cCnt.ToString(); 

    var Column = new DataColumn(); 
    Column.DataType = System.Type.GetType("System.String"); 
    Column.ColumnName = cCnt.ToString(); 
    DT.Columns.Add(Column); 

    // Create row for Data Table 
    for (int rCnt = 0; rCnt <= Range.Rows.Count; rCnt++) 
    { 
     textBox2.Text = rCnt.ToString(); 

     string CellVal = String.Empty; 
     try 
     { 
      cellVal = (string)(data[rCnt, cCnt]); 
     } 
     catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException) 
     { 
      ConvertVal = (double)(data[rCnt, cCnt]); 
      cellVal = ConvertVal.ToString(); 
     } 

     DataRow Row; 

     // Add to the DataTable 
     if (cCnt == 1) 
     { 

      Row = DT.NewRow(); 
      Row[cCnt.ToString()] = cellVal; 
      DT.Rows.Add(Row); 
     } 
     else 
     { 

      Row = DT.Rows[rCnt]; 
      Row[cCnt.ToString()] = cellVal; 

     } 
    } 
} 
+0

这仍然完美。我有4万条记录,处理时间从大约2分钟下降到大约2秒。 –

+1

我对答案中的变量用法非常困惑。它似乎并不友好。 1.我不能在这个地方使用'Range.Value2',它显示错误为“不能隐式地将object []转换为object [*,*]”。 2.我不确定Convertval变量。 – parkourkarthik

+0

@parkourkarthik我现在无法验证,但是如果您的范围是单行或一列,您可能会得到一个1-D'object []',但我认为它始终是一个二维数组。如果你还没有,可以自由地提出这个问题。 –

3

MS Office的互操作是缓慢的,甚至微软不建议在服务器端互操作使用,不能使用进口大量的Excel文件。有关更多详细信息,请参阅Microsoft的观点why not to use OLE Automation

取而代之,您可以使用任何Excel库,例如EasyXLS。这是一个代码示例,演示了如何读取Excel文件:

ExcelDocument workbook = new ExcelDocument(); 
DataSet ds = workbook.easy_ReadXLSActiveSheet_AsDataSet("excel.xls"); 
DataTable dataTable = ds.Tables[0]; 

如果您的Excel文件有多个表或导入细胞的唯一范围(更好的性能)来看看更多的代码样本上how to import Excel to DataTable in C# using EasyXLS

+5

Ouch。一个195美元的图书馆,只需在Excel工作表中阅读? –

2

如果其他人正在使用EPPlus。这种实现非常幼稚,但有些评论引起了人们的注意。如果您要在顶部再添加一个方法GetWorkbookAsDataSet(),它将执行OP所要求的操作。

/// <summary> 
    /// Assumption: Worksheet is in table format with no weird padding or blank column headers. 
    /// 
    /// Assertion: Duplicate column names will be aliased by appending a sequence number (eg. Column, Column1, Column2) 
    /// </summary> 
    /// <param name="worksheet"></param> 
    /// <returns></returns> 
    public static DataTable GetWorksheetAsDataTable(ExcelWorksheet worksheet) 
    { 
     var dt = new DataTable(worksheet.Name); 
     dt.Columns.AddRange(GetDataColumns(worksheet).ToArray()); 
     var headerOffset = 1; //have to skip header row 
     var width = dt.Columns.Count; 
     var depth = GetTableDepth(worksheet, headerOffset); 
     for (var i = 1; i <= depth; i++) 
     { 
      var row = dt.NewRow(); 
      for (var j = 1; j <= width; j++) 
      { 
       var currentValue = worksheet.Cells[i + headerOffset, j].Value; 

       //have to decrement b/c excel is 1 based and datatable is 0 based. 
       row[j - 1] = currentValue == null ? null : currentValue.ToString(); 
      } 

      dt.Rows.Add(row); 
     } 

     return dt; 
    } 

    /// <summary> 
    /// Assumption: There are no null or empty cells in the first column 
    /// </summary> 
    /// <param name="worksheet"></param> 
    /// <returns></returns> 
    private static int GetTableDepth(ExcelWorksheet worksheet, int headerOffset) 
    { 
     var i = 1; 
     var j = 1; 
     var cellValue = worksheet.Cells[i + headerOffset, j].Value; 
     while (cellValue != null) 
     { 
      i++; 
      cellValue = worksheet.Cells[i + headerOffset, j].Value; 
     } 

     return i - 1; //subtract one because we're going from rownumber (1 based) to depth (0 based) 
    } 

    private static IEnumerable<DataColumn> GetDataColumns(ExcelWorksheet worksheet) 
    { 
     return GatherColumnNames(worksheet).Select(x => new DataColumn(x)); 
    } 

    private static IEnumerable<string> GatherColumnNames(ExcelWorksheet worksheet) 
    { 
     var columns = new List<string>(); 

     var i = 1; 
     var j = 1; 
     var columnName = worksheet.Cells[i, j].Value; 
     while (columnName != null) 
     { 
      columns.Add(GetUniqueColumnName(columns, columnName.ToString())); 
      j++; 
      columnName = worksheet.Cells[i, j].Value; 
     } 

     return columns; 
    } 

    private static string GetUniqueColumnName(IEnumerable<string> columnNames, string columnName) 
    { 
     var colName = columnName; 
     var i = 1; 
     while (columnNames.Contains(colName)) 
     { 
      colName = columnName + i.ToString(); 
      i++; 
     } 

     return colName; 
    } 
+0

这段代码有帮助。解决了我的问题。非常感谢。 – Aditi

1
Dim sSheetName As String 
Dim sConnection As String 
Dim dtTablesList As DataTable 
Dim oleExcelCommand As OleDbCommand 
Dim oleExcelReader As OleDbDataReader 
Dim oleExcelConnection As OleDbConnection 

sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1""" 

oleExcelConnection = New OleDbConnection(sConnection) 
oleExcelConnection.Open() 

dtTablesList = oleExcelConnection.GetSchema("Tables") 

If dtTablesList.Rows.Count > 0 Then 
    sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString 
End If 

dtTablesList.Clear() 
dtTablesList.Dispose() 

If sSheetName <> "" Then 

    oleExcelCommand = oleExcelConnection.CreateCommand() 
    oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]" 
    oleExcelCommand.CommandType = CommandType.Text 

    oleExcelReader = oleExcelCommand.ExecuteReader 

    nOutputRow = 0 

    While oleExcelReader.Read 

    End While 

    oleExcelReader.Close() 

End If 

oleExcelConnection.Close()