2016-06-18 116 views
1

我有两个数据文件a.csvb.csv可从引擎收录获得方式有4列和一些评论:一个合并两个文件具有相同的“列名”和“不同行”用大熊猫在python

# coating file for detector A/R 
# column 1 is the angle of incidence (degrees) 
# column 2 is the wavelength (microns) 
# column 3 is the transmission probability 
# column 4 is the reflection probability 
14.2 531.0 0.0618 0.9382 
14.2 532.0 0.07905 0.92095 
14.2 533.0 0.09989 0.90011 
14.2 534.0 0.12324 0.87676 
14.2 535.0 0.14674 0.85326 
14.2 536.0 0.16745 0.83255 
14.2 537.0 0.1837 0.8163 
# 
# 171 lines, 5 comments, 166 data 

第二个文件b.csv有不同数量的行的一个共同的列两列:

# Version 2.0 - nm, [email protected] to 1, burrows+2006c91.21_T1350_g4.7_f100_solar 
# Wavelength(nm) Flambda(ergs/cm^s/s/nm) 
300.0 1.53345164121e-32 
300.1 1.53345164121e-32 
300.2 1.53345164121e-32 

# total lines = 20003, comment lines = 2, data lines = 20001 

现在,我想合并这两个文件与第二列公共(两个文件中的波长应该是相同的)。

输出看起来像:

# coating file for detector A/R 
# column 1 is the angle of incidence (degrees) 
# column 2 is the wavelength (microns) 
# column 3 is the transmission probability 
# column 4 is the reflection probability 
# Version 2.0 - nm, [email protected] to 1, burrows+2006c91.21_T1350_g4.7_f100_solar 
# Wavelength(nm) Flambda(ergs/cm^s/s/nm) 
14.2 531.0 0.0618 0.9382 1.14325276212 
14.2 532.0 0.07905 0.92095 1.14557732058 

注:的意见也被合并。
在文件b.csv中,波长是行号= 2313.

我们如何在python中这样做?

我最初的尝试是这样的:

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 
# Author : Bhishan Poudel 
# Date  : Jun 17, 2016 


# Imports 
from __future__ import print_function 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 


# read in dataframes 
#====================================================================== 
# read in a file 
# 
infile = 'a.csv' 
colnames = ['angle', 'wave','trans','refl'] 
print('{} {} {} {}'.format('\nreading file : ', infile, '','')) 
df1 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0, 
     comment='#',names=colnames,usecols=(0,1,2,3)) 

print('{} {} {} {}'.format('df.head \n', df1.head(),'','')) 
#------------------------------------------------------------------ 


#====================================================================== 
# read in a file 
# 
infile = 'b.csv' 
colnames = ['wave', 'flux'] 
print('{} {} {} {}'.format('\nreading file : ', infile, '','')) 
df2 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0, 
     comment='#',names=colnames,usecols=(0,1)) 
print('{} {} {} {}'.format('df.head \n', df2.head(),'','\n')) 
#---------------------------------------------------------------------- 


result = df1.append(df2, ignore_index=True) 
print(result.head()) 
print("\n") 

一些有用的链接如下:
How to merge data frame with same column names
http://pandas.pydata.org/pandas-docs/stable/merging.html

回答

2

如果您想将两个数据集合并,你应该使用.merge()方法,而不是.append()

result = pd.merge(df1,df2,on='wave') 

前者连接两个数据帧(类似于SQL连接),而后者则将两个数据帧叠加在一起。

相关问题