2017-02-22 46 views
1

我正在尝试读取Big XLSX文件。 Excel文件有大约500K rows.I需要阅读山坳2.在java中读取巨大的Excel文件(500K行)

OPCPackage pkg; 
pkg = OPCPackage.open("File path"); 
XSSFWorkbook myWorkBook = new XSSFWorkbook(pkg); 
Sheet sheet = myWorkBook.getSheetAt(2); 
Iterator<Row> rowIterator = sheet.iterator(); 
while (rowIterator.hasNext()) 
{ 
Row row = rowIterator.next(); 
if (row_num > ROW_ESCAPE) 
{ 
    Cell cell = row.getCell(2); 
    if (!cell.getStringCellValue().toString().trim().isEmpty()) 
      { 
       System.out.println(cell.getStringCellValue().toString()); 
      } 
System.out.println("hi"+row_num); 
     } 
     row_num++; 
} 

它打印,直到行39723 它抛出以下异常

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space 
at java.util.regex.Matcher.<init>(Matcher.java:225) 
at java.util.regex.Pattern.matcher(Pattern.java:1093) 
at org.apache.poi.xssf.usermodel.XSSFRichTextString.utfDecode(XSSFRichTextString.java:482) 
at org.apache.poi.xssf.usermodel.XSSFRichTextString.getString(XSSFRichTextString.java:297) 
at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:262) 
at Main.get_titles(Main.java:484) 
at Main.analyze_Importsheet(Main.java:461) 
at Main.but_sel_imp_sheetActionPerformed(Main.java:220) 
at Main.access$000(Main.java:40) 
at Main$1.actionPerformed(Main.java:85) 
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) 
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) 
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) 
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) 
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252) 
at java.awt.Component.processMouseEvent(Component.java:6533) 
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) 
at java.awt.Component.processEvent(Component.java:6298) 
at java.awt.Container.processEvent(Container.java:2236) 
at java.awt.Component.dispatchEventImpl(Component.java:4889) 
at java.awt.Container.dispatchEventImpl(Container.java:2294) 
at java.awt.Component.dispatchEvent(Component.java:4711) 
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) 
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525) 
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) 
at java.awt.Container.dispatchEventImpl(Container.java:2280) 
at java.awt.Window.dispatchEventImpl(Window.java:2746) 
at java.awt.Component.dispatchEvent(Component.java:4711) 
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) 
at java.awt.EventQueue.access$500(EventQueue.java:97) 
at java.awt.EventQueue$3.run(EventQueue.java:709) 
at java.awt.EventQueue$3.run(EventQueue.java:703) 

Main.java:484=if后(!cell.getStringCellValue()。toString()。trim()。isEmpty()) 如果我删除该行并只打印行号,它可以正常工作。 我需要帮助如何获得col 2的字符串值。

回答

0

增加JVM的堆大小可能会修复您的OutOfMemoryError。有关如何增加JVM的堆大小,请参阅this stackoverflow post

+0

我不得不提。我已经使用java -Xmx1G -jar Importsheet_Breaker.jar –

0

最简单的方法(不改变你的阅读逻辑)就是增加堆的大小。

如果这不适合您,请使用流。其实,有一个方便的图书馆已经可用。

https://github.com/monitorjbl/excel-streaming-reader

+0

我的Excel工作表有一些隐藏工作表。随着流我不能读这些表。 XSSFWorkbook oldWorkbook; OPCPackage pkg; pkg = OPCPackage.open(myImport.get_path()); oldWorkbook =(XSSFWorkbook)WorkbookFactory.create(pkg); 昨天bobe代码正在工作,但令人惊讶的是,今天停止工作,并抛出一个heapsize错误。 –