我有这样一个数据文件:如何将数据文件拆分为多个部分以及每个拆分文件中的注释?
# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.531000 0.0618000 0.938200
14.2000 0.532000 0.0790500 0.920950
14.2000 0.533000 0.0998900 0.900110
# it has lots of other lines
# datafile can be obtained from pastebin
输入数据文件的链接是: http://pastebin.com/NaNbEm3E
我想从这个输入创建20个文件,每个文件有意见一致。
即:
#out1.txt
#comments
first part of one-twentieth data
# out2.txt
# given comments
second part of one-twentieth data
# and so on upto out20.txt
我们怎样才能在Python这样做呢?
我的初使尝试是这样的:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : May 23, 2016
# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# read in comments from the file
infile = 'filecopy_multiple.txt'
outfile = 'comments.txt'
comments = []
with open(infile, 'r') as fi, open (outfile, 'a') as fo:
for line in fi.readlines():
if line.startswith('#'):
comments.append(line)
print(line)
fo.write(line)
#==============================================================================
# read in a file
#
infile = infile
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '',''))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print('{} {} {} {}'.format('length of df : ', len(df),'',''))
# write 20 files
df = df
nfiles = 20
nrows = int(len(df)/nfiles)
groups = df.groupby( np.arange(len(df.index))/nrows )
for (frameno, frame) in groups:
frame.to_csv("output_%s.csv" % frameno,index=None, header=None,sep='\t')
到现在我有二十劈裂文件。我只想将评论行复制到每个文件。但问题是:how to do so?
应该有一些更容易的方法比创建另外20个输出文件与仅评论和追加twenty_splitted_files给他们。
一些有用的链接如下:
How to split a dataframe column into multiple columns
How to split a DataFrame column in python
Split a large pandas dataframe
这不是很清楚为什么你需要大熊猫/数据帧在这种情况下...你想保持现有的文件格式,或者你想保存splited文件作为正常CSV或HDF5文件? – MaxU
@MaxU我想将分割文件保存为正常的CSV文件,以便每个二十个输出文件具有与输入文件相同的头部注释。 –
您的原始CSV文件是否适合内存,或者您是否必须逐行读取它? – MaxU