2016-08-12 50 views
0

快速概述:主要目标是从行中的设置日期读取数据,并从设置日期获取参考编号,例如,开始日期。从日期范围的EPPlus输出中剥离数据

例如,如果我只是想从设置的日期的数据到上个月和第一个月以上。

我现在有提取从下面的Excel电子表格示例的一些数据:使用EPPlus

Start date Ref number 
29/07/2015 2342326 
01/07/2016 5697455 
02/08/2016 3453787 
02/08/2016 5345355 
02/08/2015 8364456 
03/08/2016 1479789 
04/07/2015 9334578 

enter image description here

输出:

29/07/2015 
2342326 
29/07/2016 
5697455 
02/08/2016 
3453787 
02/08/2016 
5345355 
02/08/2015 
8364456 
03/08/2016 
1479789 
04/07/2015 
9334578 

这部分是好的,但是当我尝试通过日期范围去除输出我得到错误,例如使用LINQ我得到以下错误输出。

An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll 

Additional information: Specified cast is not valid. 

LINQ代码:

var rowsOfInterest = tbl.AsEnumerable() 
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) 
.ToList(); 

我也试着使用DataTable从日期范围修改:

DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#"); 

但得到以下错误:

An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll 

Additional information: Cannot perform '>=' operation on System.String and System.Double. 

最后一次尝试是尝试看看是否我可以从循环中删除日期。

代码使用:

DateTime dDate; 
row[cell.Start.Column - 1] = cell.Text; 
string dt = cell.Text.ToString(); 

if (DateTime.TryParse(dt, out dDate)) 
{ 
    DateTime dts = Convert.ToDateTime(dt); 
} 

DateTime date1 = new DateTime(2016, 7, 1); 

if (dDate >= date1) 
{ 
    Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); 
} 

这类作品,但只列出一组日期,不存在价值,这是可以理解的,如果我走这条路我怎么才能与有值的日期?

输出:

29/07/2016 
02/08/2016 
02/08/2016 
03/08/2016 

使用完整的代码示例:从修改

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 
using System.Data.OleDb; 
using System.Text.RegularExpressions; 
using Microsoft.Office.Interop.Excel; 
using System.Data; 
using System.IO; 

namespace Number_Cleaner 
{ 
    public class NumbersReport 
    { 

     //ToDo: Look in to fixing the code so it filters the date correctly with the right output data. 
     public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true) 
     { 
      using (var pck = new OfficeOpenXml.ExcelPackage()) 
      { 
       using (var stream = File.OpenRead(path)) 
       { 
        pck.Load(stream); 
       } 
       var ws = pck.Workbook.Worksheets.First(); 
       System.Data.DataTable tbl = new System.Data.DataTable(); 
       foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column]) 
       { 
        tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column)); 
       } 
       var startRow = hasHeader ? 2 : 1; 
       for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++) 
       { 
        var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column]; 
        DataRow row = tbl.Rows.Add(); 
        foreach (var cell in wsRow) 
        { 

         DateTime dDate; 
         row[cell.Start.Column - 1] = cell.Text; 
         string dt = cell.Text.ToString(); 
         //Console.WriteLine(dt); 

         if (DateTime.TryParse(dt, out dDate)) 
         { 
          DateTime dts = Convert.ToDateTime(dt); 
         } 

         DateTime date1 = new DateTime(2016, 7, 1); 

         if (dDate >= date1) 
         { 
          Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); 
         } 

         //Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); 
        } 
       } 
       //var rowsOfInterest = tbl.AsEnumerable() 
       // .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) 
       //.ToList(); 
       //Console.WriteLine(tbl); 
       //DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#"); 

       return tbl; 
      } 
     } 

How to match date to row then get the final column value using EPPlus?

回答

1

基于您的代码,您存储一切都在你的DataTable作为strings通过调用cell.Text。但是,使用这种方法会丢失有价值的信息 - 单元数据类型。你使用cell.Value要好得多,它可以是stringdouble。使用Excel,日期,整数和小数值全部存储为doubles

错误你看到有您存储值作为字符串,但查询它们像DateTime这里其实要做到:

.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) 

这里:

"'Start date' >= #1/7/2016#" 

如果你看一下我的帖子在这里:How to parse excel rows back to types using EPPlus你会看到辅助功能ConvertSheetToObjects,它几乎处理你正在做的事情。稍做修改后,我们可以将其转化为WorkSheet并将其转换为DataTable。如对象converstion方法,你还应当提供其预期的结构形式为DataTable通过它而不是让它尝试通过铸造池中值来猜它:在像这样

public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable) 
{ 
    //DateTime Conversion 
    var convertDateTime = new Func<double, DateTime>(excelDate => 
    { 
     if (excelDate < 1) 
      throw new ArgumentException("Excel dates cannot be smaller than 0."); 

     var dateOfReference = new DateTime(1900, 1, 1); 

     if (excelDate > 60d) 
      excelDate = excelDate - 2; 
     else 
      excelDate = excelDate - 1; 
     return dateOfReference.AddDays(excelDate); 
    }); 

    //Get the names in the destination TABLE 
    var tblcolnames = dataTable 
     .Columns 
     .Cast<DataColumn>() 
     .Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType}) 
     .ToList(); 

    //Cells only contains references to cells with actual data 
    var cellGroups = worksheet.Cells 
     .GroupBy(cell => cell.Start.Row) 
     .ToList(); 

    //Assume first row has the column names and get the names of the columns in the sheet that have a match in the table 
    var colnames = cellGroups 
     .First() 
     .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx }) 
     .Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name)) 
     .ToList(); 


    //Add the rows - skip the first cell row 
    for (var i = 1; i < cellGroups.Count(); i++) 
    { 
     var cellrow = cellGroups[i].ToList(); 
     var tblrow = dataTable.NewRow(); 
     dataTable.Rows.Add(tblrow); 

     colnames.ForEach(colname => 
     { 
      //Excel stores either strings or doubles 
      var cell = cellrow[colname.index]; 
      var val = cell.Value; 
      var celltype = val.GetType(); 
      var coltype = tblcolnames.First(tcol => tcol.Name == colname.Name).Type; 

      //If it is numeric it is a double since that is how excel stores all numbers 
      if (celltype == typeof(double)) 
      { 
       //Unbox it 
       var unboxedVal = (double)val; 

       //FAR FROM A COMPLETE LIST!!! 
       if (coltype == typeof (int)) 
        tblrow[colname.Name] = (int) unboxedVal; 
       else if (coltype == typeof (double)) 
        tblrow[colname.Name] = unboxedVal; 
       else 
        throw new NotImplementedException($"Type '{coltype}' not implemented yet!"); 
      } 
      else if (coltype == typeof (DateTime)) 
      { 
       //Its a date time 
       tblrow[colname.Name] = val; 
      } 
      else if (coltype == typeof (string)) 
      { 
       //Its a string 
       tblrow[colname.Name] = val; 
      } 
      else 
      { 
       throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!"); 
      } 
     }); 

    } 

} 

要使用它:

enter image description here

你会运行此:

[TestMethod] 
public void Sheet_To_Table_Test() 
{ 
    //https://stackoverflow.com/questions/38915006/stripping-data-from-a-epplus-output-from-a-date-range 

    //Create a test file 
    var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx"); 

    using (var package = new ExcelPackage(fi)) 
    { 
     var workbook = package.Workbook; 
     var worksheet = workbook.Worksheets.First(); 

     var datatable = new DataTable(); 
     datatable.Columns.Add("Col1", typeof(int)); 
     datatable.Columns.Add("Col2", typeof(string)); 
     datatable.Columns.Add("Col3", typeof(double)); 
     datatable.Columns.Add("Col4", typeof(DateTime)); 

     worksheet.ConvertSheetToDataTable(ref datatable); 

     foreach (DataRow row in datatable.Rows) 
      Console.WriteLine(
       $"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" + 
       $", Col2({row["Col2"].GetType()}): {row["Col2"]}" + 
       $", Col3({row["Col3"].GetType()}): {row["Col3"]}" + 
       $", Col4({row["Col4"].GetType()}):{row["Col4"]}}}"); 

     //To Answer OP's questions 
     datatable 
      .Select("Col4 >= #01/03/2016#") 
      .Select(row => row["Col1"]) 
      .ToList() 
      .ForEach(num => Console.WriteLine($"{{{num}}}")); 
    } 
} 

其中在输出给出了这样:

row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM} 
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM} 
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM} 
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM} 
{8} 
{978} 
+0

对不起,回复迟了。给出的信息非常丰富,但我确实有一个问题,那就是“worksheet.ConvertSheetToDataTable(ref datatable)”;行我得到以下错误: – Mattlinux1

+0

错误'OfficeOpenXml.ExcelWorksheet'不包含'ConvertSheetToDataTable'的定义,并且没有扩展方法'ConvertSheetToDataTable'接受类型'OfficeOpenXml.ExcelWorksheet'的第一个参数可以找到(是你缺少使用指令或程序集引用? – Mattlinux1

+0

已将行更改为ConvertSheetToDataTable(工作表,ref datatable); – Mattlinux1