从多个csv文件中提取行和文件名

我有多个csv文件，文件夹中的日期是文件名（20080101.csv到20111031.csv）。 csv文件具有通用标题。 csv文件如下所示：从多个csv文件中提取行和文件名

20080101.csv 
X ;Y; Z 
1 ; 1 ; 3 
1 ; 2 ; 6 
1 ; 3 ; 24 
2 ; 1 ; 24 
2 ; 2 ; 24 

20080102.csv 
X ;Y; Z 
1 ; 1 ; 0.1 
1 ; 2 ; 2 
1 ; 3 ; 67 
2 ; 1 ; 24 
2 ; 2 ; 24 

20080103.csv 
X ;Y; Z 
1 ; 1 ; 3 
1 ; 3 ; 24 
2 ; 1 ; 24 
2 ; 2 ; 24 

20080104.csv 
X ;Y; Z 
1 ; 1 ; 34 
1 ; 2 ; 23 
1 ; 3 ; 67 
2 ; 1 ; 24 
2 ; 2 ; 24

...等等。我想编写一个脚本来读取行，如果在给定行中我们有X = 1和Y = 2，整行将被复制到一个新的csv文件以及给出以下输出的文件名：

X ;Y ; Z ; filename 
1 ; 2 ; 6 ; 20080101 
1 ; 2 ; 2 ; 20080102 
1 ; 2 ; NA; 20080103 
1 ; 2 ; 23; 20080104

任何想法如何做到这一点，以及任何有关模块的建议，我应该看看或任何示例。感谢您的时间和帮助。

干杯，纳文

来源

2011-11-04 Navin

你不感兴趣那些（x，y）不是（1,2）的记录？把它们扔掉？ – yosukesabai

你真的可以调用由csv分号分隔的文件吗？ –

@Danny字符分隔值？我抓着那些吸管:) –

这应该做的工作：

import glob 
import os 

outfile = open('output.csv', 'w') 
outfile.write('X ; Y ; Z ; filename\n') 
for filename in glob.glob('*.csv'): 
    if filename == 'output.csv': # Skip the file we're writing. 
    continue 
    with open(filename, 'r') as infile: 
    count = 0 
    lineno = 0 
    for line in infile: 
     lineno += 1 
     if lineno == 1: # Skip the header line. 
     continue 
     fields = line.split(';') 
     x = int(fields[0]) 
     y = int(fields[1]) 
     z = float(fields[2]) 
     if x == 1 and y == 2: 
     outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename)) 
     count += 1 
    if count == 0: # Handle the case when no lines were found. 
     outfile.write('1 ; 2 ; NA ; %s\n' % filename) 
outfile.close()

请注意，如果您无法控制或信任该文件格式，则可能需要处理由转换引发的异常以int/float。

来源

2011-11-04 23:58:58

非常感谢你......对于迟到的回复感到抱歉。作为初学者，我花了一些时间来了解每一个有价值的答复。学习Python确实很有趣！ – Navin

你可以在每次打开一个文件中读取。按行读取

files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc 
for f in files: 
    file = open(f, 'r') 
    for line in file: 
     ray = line.split(';') 
     if (ray[0].strip() == '1' and ray[1].strip() == '2'): 
      fout = open('output.csv', 'a') 
      fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n') 
      fout.close() 
    file.close()

经过测试和工作。可能需要稍作修改。

来源

2011-11-04 23:48:36 Genzume

'ray [n]'将是一个字符串... – Avaris

因为ray是一个字符串列表，所以if将总是失败。 –

@AdamZalcman：我检查了这段代码，它工作正常。ray是一串字符串。我将每个字符串分解为数字，然后将其与“1”和“2”进行比较。请测试这个，并告诉我你告诉我这是错误的之前得到一个错误。 – Genzume

如果你知道你每天有一个文件，没有错过的一天，那么我会使用glob（'* .csv'）来获取文件名列表，打开一个再见一个，然后像泰勒读是做

如果你知道有几天文件丢失，我会使用datetime与datetime.date（2008,1,1）和循环递增一天。然后每天的使用我.strftime（）+'.csv格式撰写的文件名，并尝试处理文件（如果没有文件，只写与NA一个重新编码）

来源

2011-11-04 23:54:49 yosukesabai

以下应该工作：

import csv 
with open('output.csv', 'w') as outfile: 
    outfile.write('X ; Y ; Z ; filename\n') 
    fmt = '1 ; 2 ; %s ; %s\n' 
    files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv'] 
    for file in files: 
     with open(file) as f: 
      reader = csv.reader(f, delimiter=';') 
      for row in reader: 
       if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2': 
        outfile.write(fmt % (row[2].strip(), file[:-4])) 
        break 
      else: 
       outfile.write(fmt % ('NA', file[:-4]))

来源

2011-11-05 00:02:17

这是合式问题，从该逻辑应该是显而易见的。对于有人提供完成的代码将会破坏作业的目的。首先，在问题中添加一个“家庭作业”标签，然后考虑你想要做什么： 1）循环遍历文件（保持每个文件名在打开时跟踪每个文件名） 2）从当前文件读取行 3）如果满足选择条件（x == 1和y == 2），则写下该行。

要开始，尝试：

import csv, os 

for fn in os.listdir(): 
    if ".csv" in fn: 
     with open(fn, 'r', newline='') as f: 
      reader = csv.reader(f, delimiter=";") 
      for row in reader: 
       ...

然后扩展的解决方案打开输出文件，并使用csv.writer选中的行写。

来源

2011-11-05 00:08:54 Dave

从多个csv文件中提取行和文件名

回答

相关问题