2014-07-23 41 views
1

例如,我想使用python从文本文件中取出前30%的数据。从txt文件中获取一定数量的数据

这里有一些代码我都试图产生两个新的文件,但我不知道如何利用数据的一定比例和他们写

这里是:

import sys 

def NewFile(FileName): 
     open(FileName,'r') 
     print('Creating new text file') 
     #A for 70% training data 
     AFileName = 'A'+FileName 
     #B for 30% testing data 
     BFileName = 'B'+FileName 

     try: 
       afile = open(AFileName,'a') 
       bfile = open(BFileName,'a') 
       afile.close() 
       bfile.close() 
     except: 
       print('~~~~~~') 
       sys.exist(0) 
+0

percentae意味着如果有100行做你想要30线 –

回答

0

听起来像是你想沿着这些线路的东西,在这里filename你从阅读和proportion该文件是你想要的百分比在第一个文件:

def split_file(filename, tofile, othertofile, proportion): 
    content = open(filename).readlines() 
    number_of_lines = len(content) 

    # Split content. 
    first_portion = "\n".join(content[:number_of_lines * proportion]) 
    second_portion = "\n".join(content[number_of_lines * proportion:]) 

    # Write to files. 
    open(tofile, "w").write(first_portion) 
    open(othertofile, "w").write(second_portion) 
+0

这些行末尾已经有换行符。拆分索引应该是一个'int'以便在切片中使用它。文件应该在完成后关闭。没有保证何时会发生这种情况。 – BlackJack

+0

非常感谢,它效果很好!我也发现需要在计算之前添加int))) – user3843433

0

这将工作

import sys 

def NewFile(FileName): 
    lines = open(FileName, 'r').readlines() 
    print('Creating new text file') 
    num_lines = len(lines) 
    num_lines_a = 0.7 * num_lines 
    #A for 70% training data 
    AFileName = 'A'+FileName 
    #B for 30% testing data 
    BFileName = 'B'+FileName 

    try: 
     afile = open(AFileName,'a') 
     bfile = open(BFileName,'a') 
     a = 1 
     for line in lines: 
      if a <= num_lines_a: 
       afile.write(line) 
       a +=1 
      else: 
       bfile.write(line) 
     afile.close() 
     bfile.close() 
    except: 
      print('~~~~~~') 
      sys.exist(0) 
+0

可以使用枚举? –

0

另一种方式来做到这一点:

from itertools import islice 


def new_file(old_filename): 
    with open(old_filename, 'r') as old_file: 
     lines = list(old_file) 
    training_file_line_count = int(len(lines) * 0.7) 
    lines_iter = iter(lines) 
    with open('A' + old_filename, 'w') as training_file: 
     training_file.writelines(islice(lines_iter, training_file_line_count)) 
    with open('B' + old_filename, 'w') as testing_data_file: 
     testing_data_file.writelines(lines_iter)