将一个大文件拆分成基于一行的较小文件

我有一个非常大的文件（大于20GB），我想将它分成较小的文件，如2GB的多个文件。将一个大文件拆分成基于一行的较小文件

一两件事是我有一个特定的行之前拆分：

我使用Python，但如果有壳例如另一种解决办法，我为它。

这是大文件的样子：

bigfile.txt（20GB）

Recno:: 0 
some data... 

Recno:: 1 
some data... 

Recno:: 2 
some data... 

Recno:: 3 
some data... 

Recno:: 4 
some data... 

Recno:: 5 
some data... 

Recno:: x 
some more data...

这就是我想要的：

file1.txt（2 GB +/-）

Recno::0 
some data... 

Recno:: 1 
some data...

（2GB +/-）

Recno:: 2 
some data... 

Recno:: 4 
some data... 

Recno:: 5 
some data...

等等，等等...

谢谢！

来源

2016-07-26 Difender

这个可能的复制？ http://stackoverflow.com/questions/2016894/how-to-split-a-large-text-file-into-smaller-files-with-equal-number-of-lines –

如果你向我们展示它会很有用一些带有几行的小例子，显示文件将在哪里分割（或不分割）。 –

@Chris_Rands不是因为我不想用一组给定的行来分割，而是使用特定的行。只有当它超过2Go并出现Recno :: * int *时。 – Difender

你可以做这样的事情：

import sys 

try: 
    _, size, file = sys.argv 
    size = int(size) 
except ValueError: 
    sys.exit('Usage: splitter.py <size in bytes> <filename to split>') 

with open(file) as infile: 
    count = 0 
    current_size = 0 
    # you could do something more 
    # fancy with the name like use 
    # os.path.splitext 
    outfile = open(file+'_0', 'w+') 
    for line in infile: 
     if current_size > size and line.startswith('Recno'): 
      outfile.close() 
      count += 1 
      current_size = 0 
      outfile = open(file+'_{}'.format(count), 'w+') 
     current_size += len(line) 
     outfile.write(line) 
    outfile.close()

来源

2016-07-26 13:38:03

This正是我所期待的，非常感谢你！ – Difender

-1

正如上面的评论中提到，你可以在bash shell中使用split：

split -b 20000m <path-to-your-file>

来源

2016-07-26 13:24:33 JoshuaBox

正如我所说，我不想只分裂的大小。我必须在规模上进行分割，但也要按照给定的路线分割。例如，每个文件必须以'Recno :: x' – Difender

开始，你可以用'os.stat（'/ path/to/file /'）在Python中监控文件大小。st_size'在while循环中 – JoshuaBox

将一个大文件拆分成基于一行的较小文件

回答

相关问题