2015-06-26 168 views
1

我想在Python熊猫中执行一些算术运算,并将结果合并到其中一个文件中。列操作多个文件大熊猫

Path_1: File_1.csv, File_2.csv, .... 

此路径有几个文件,它们应该在时间间隔内增加。有以下的列

File_1.csv | File_2.csv 
    Nos,12:00:00 | Nos,12:30:00 

    123,1451   485,5464 
    656,4544   456,4865 
    853,5484   658,4584 

Path_2: Master_1.csv 

Nos,00:00:00 
123,2000 
485,1500 
656,1000 
853,2500 
456,4500 
658,5000 

我试图从Path_1阅读n.csv文件和col[1]头时间序列与Master_1.csvcol[last]时间序列进行比较。

如果Master_1.csv没有那个时候它应该创建与path_1 .csv文件的时间序列的新列和而从Master_1.csvcol[1]减去他们对于col['Nos']更新值。

如果col随时间从path_1 file存在,则寻找col['Nos']然后替换NAN与减去值相对于该col['Nos']

在Master_1.csv期望输出

Nos,00:00:00,12:00:00,12:30:00, 
    123,2000,549,NAN, 
    485,1500,NAN,3964, 
    656,1000,3544,NAN 
    853,2500,2984,NAN 
    456,4500,NAN,365 
    658,5000,NAN,-416 

我能理解的数学计算,但我不能够循环相对于Nostimeseries我曾试图把一些代码放在一起并试图绕过循环。在这方面需要帮助。由于

import pandas as pd 
import numpy as np 

path_1 = '/' 
path_2 = '/' 

df_1 = pd.read_csv(os.path_1('/.*csv'), Index=None, columns=['Nos', 'timeseries'] #times series is different in every file eg: 12:00, 12:30, 17:30 etc 
df_2 = pd.read_csv('master_1.csv', Index=None, columns=['Nos', '00:00:00']) #00:00:00 time series 

for Nos in df_1 and df_2: 
    df_1['Nos'] = df_2['Nos'] 
    new_tseries = df_2['00:00:00'] - df_1['timeseries'] 

merged.concat('master_1.csv', Index=None, columns=['Nos', '00:00:00', 'new_tseries'], axis=0) # new_timeseries is the dynamic time series that every .csv file will have from path_1 

回答

2

你可以做到这三步

  1. 阅读您的CSV的中dataframes列表
  2. 合并的dataframes在一起(相当于SQL左连接或Excel VLOOKUP
  3. 使用矢量化减法计算派生列

下面是一些您可以尝试的代码:

#read dataframes into a list 
import glob 
L = [] 
for fname in glob.glob(path_1+'*.csv'): 
    L.append(df.read_csv(fname)) 

#read master dataframe, and merge in other dataframes 
df_2 = pd.read_csv('master_1.csv') 
for df in L: 
    df_2 = pd.merge(df_2,df, on = 'Nos', how = 'left') 

#for each column, caluculate the difference with the master column 
df_2.apply(lambda x: x - df_2['00:00:00']) 
+0

这给出了一个错误'回溯(最近最后一次通话): 文件 “main_1.py”,第12行,在 data_frame.append(df.read_csv(FNAME)) NameError:名字“DF '没有定义' –