2010-05-26 180 views
1

我有一个非常大的CSV文件。确切地说51427行。在PHP中读取大型CSV文件

难道没有办法只能将所需的行读入数组吗?这会显着加快速度。

+1

你有没有尝试过像'ini_set(“max_execution_time”,0)''这样的最大执行时间? – robjmills 2010-05-26 15:46:06

+0

几个问题: - 你如何将文件导入数据库? - 您是在导入之前上传文件还是实时读取文件? – allnightgrocery 2010-05-26 15:48:49

回答

2

您可能想看看流式传输csv文件。发送启动文件位置,起始位置和字节数改为得到paramters到ProgressiveReader.php

class NoFileFoundException extends Exception { 
    function __toString() { 
     return '<h1><b>ERROR:</b> could not find (' 
        .$this->getMessage(). 
        ') please check your settings.</h1>'; 
    } 
} 

class NoFileOpenException extends Exception { 
    function __toString() { 
     return '<h1><b>ERROR:</b> could not open (' 
        .$this->getMessage(). 
        ') please check your settings.</h1>'; 
    } 
} 

interface Reader { 
    function setFileName($fName); 
    function open(); 
    function setBufferOffset($offset); 
    function bufferSize(); 
    function isOffset(); 
    function setPacketSize($size); 
    function read(); 
    function isEOF(); 
    function close(); 
    function readAll(); 
} 

class ProgressiveReader implements Reader { 
    private $fName; 
    private $fileHandler; 
    private $offset = 0; 
    private $packetSize = 0; 

    public function setFileName($fName) { 
     $this->fName = $fName; 
     if(!file_exists($this->fName)) { 
      throw new NoFileFoundException($this->fName); 
     } 
    } 

    public function open() { 
     try { 
      $this->fileHandler = fopen($this->fName, 'rb'); 
     } 
     catch (Exception $e) { 
      throw new NoFileOpenException($this->fName); 
     } 
     fseek($this->fileHandler, $this->offset); 
    } 

    public function setBufferOffset($offset) { 
     $this->offset = $offset; 
    } 

    public function bufferSize() { 
     return filesize($this->fName) - (($this->offset > 0) ? ($this->offset + 1) : 0); 
    } 

    public function isOffset() { 
     if($this->offset === 0) { 
      return false; 
     } 
     return true; 
    } 

    public function setPacketSize($size) { 
     $this->packetSize = $size; 
    } 

    public function read() { 
     return fread($this->fileHandler, $this->packetSize); 
    } 

    public function isEOF() { 
     return feof($this->fileHandler); 
    } 

    public function close() { 
     if($this->fileHandler) { 
      fclose($this->fileHandler); 
     } 
    } 

    public function readAll() { 
     return fread($this->fileHandler, filesize($this->fName)); 
    } 
} 

下面是单元测试:

require_once 'PHPUnit/Framework.php'; 

require_once dirname(__FILE__).'/../ProgressiveReader.php'; 

class ProgressiveReaderTest extends PHPUnit_Framework_TestCase { 

    protected $reader; 
    private $fp; 
    private $fname = "Test.txt"; 

    protected function setUp() { 
     $this->createTestFile(); 
     $this->reader = new ProgressiveReader(); 
    } 

    protected function tearDown() { 
     $this->reader->close(); 
    } 

    public function test_isValidFile() { 
     $this->reader->setFileName($this->fname); 
    } 

    public function test_isNotValidFile() { 
     try { 
      $this->reader->setFileName("nothing.tada"); 
     } 
     catch (Exception $e) { 
      return; 
     } 

     $this->fail(); 
    } 

    public function test_isFileOpen() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->open(); 
    } 

    public function test_couldNotOpenFile() { 
     $this->reader->setFileName($this->fname); 
     try { 
      $this->deleteTestFile(); 
      $this->reader->open(); 
     } 
     catch (Exception $e) { 
      return; 
     } 

     $this->fail(); 
    } 

    public function test_bufferSizeZeroOffset() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->bufferSize(), 12); 
    } 

    public function test_bufferSizeTwoOffset() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(2); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->bufferSize(), 9); 
    } 

    public function test_readBuffer() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(0); 
     $this->reader->setPacketSize(1); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->read(), "T"); 
    } 

    public function test_readBufferWithOffset() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(2); 
     $this->reader->setPacketSize(1); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->read(), "S"); 
    } 

    public function test_readSuccesive() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(0); 
     $this->reader->setPacketSize(6); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->read(), "TEST1\n"); 
     $this->assertEquals($this->reader->read(), "TEST2\n"); 
    } 

    public function test_readEntireBuffer() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->open(); 
     $this->assertEquals($this->reader->readAll(), "TEST1\nTEST2\n"); 
    } 

    public function test_isNotEOF() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(2); 
     $this->reader->setPacketSize(1); 
     $this->reader->open(); 
     $this->assertFalse($this->reader->isEOF()); 
    } 

    public function test_isEOF() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(0); 
     $this->reader->setPacketSize(15); 
     $this->reader->open(); 
     $this->reader->read(); 
     $this->assertTrue($this->reader->isEOF()); 
    } 

    public function test_isOffset() { 
     $this->reader->setFileName($this->fname); 
     $this->reader->setBufferOffset(2); 
     $this->assertTrue($this->reader->isOffset()); 
    } 

    public function test_isNotOffset() { 
     $this->reader->setFileName($this->fname); 
     $this->assertFalse($this->reader->isOffset()); 
    } 

    private function createTestFile() { 
     $this->fp = fopen($this->fname, "wb"); 
     fwrite($this->fp, "TEST1\n"); 
     fwrite($this->fp, "TEST2\n"); 
     flush(); 
     fclose($this->fp); 
    } 

    private function deleteTestFile() { 
     if(file_exists($this->fname)) { 
      unlink($this->fname); 
     } 

    } 
} 
+1

Lotsa代码可能是一次(或罕见)导入过程。 上传csv并使用mysql控制台加载数据。 – racerror 2010-05-26 20:02:44

+0

是的你的权利,除了他不想等待整个文件上传然后存储*'我不高兴在处理'*之前将CSV文件的整个51427行到达数组。还有一些代码测试。哦,一个人的**一次**是另一个重复的努力,直到你生气和自动化。 – Gutzofter 2010-05-26 20:16:38

2

您可以直接连接到数据库服务器吗?

如果是这样,我会考虑使用像SQLyog第三方程序来导入您的csv。

你也可以上传文件,并使用mysql外壳直接导入数据:

LOAD DATA INFILE '/path/to/your_file.csv' INTO TABLE table_name FIELDS TERMINATED BY ','; 
1

您的脚本可能花费的时间太长,它被终止。

您应该在php.ini中查找max_execution_time指令并将其设置为适合您的值。

默认的max_execution_time设置为30秒,所以你的脚本可能会被终止。

如果您还有脚本需要及时进行限制,您可以通过调用set_time_init()来单独执行该脚本;

1

您是否尝试过使用bash/shell(如果您在linux上)将您的csv导入到mysql中?你也可以使用ruby或者perl或者whatnot,因为我认为你应该使用它来代替php(或任何web应用程序)来导入文件。

2

此读取整个CSV文件到一个数组

所有50000+行?

通过逐行读取(fgets()),然后将每个(需要的)行添加到数组,从PHP开始读取文件的所需块;你可以用fgetcsv()获得该行的数组。

编辑:我不知道确切的细节,但我觉得将所有内容读入数据结构的成本比读取我们需要的更多。

0

呸!忽略这个答案。是重复的。见Scorchio上面提到的fgetcsv()。