2016-02-28 89 views
0

我有两个csv文件。一个被称为“标准reg.csv”,另一个是“司机Details.csv”如何在Python中比较两个csv文件

在“标准reg.csv”的前两行是:

['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit'] 
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355'] 

前两个驱动程序详细信息行的.csv是:

['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration'] 
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG'] 

我的代码是这样的:

import csv 
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',') 
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',') 
for row in file_1: 
    reg = row[3] 
    avgspeed = row[7] 
    for row in file_2: 
     firstname = row[0] 
     lastname = row[1] 
     address = row[2] 
     city = row[3] 
     region = row[4] 
     reg2 = row[7] 
if reg == reg2: 
    print('Match found') 
else: 
    print('No match found') 

这是一个工作在进步,但我似乎无法获取代码以比较不仅仅是最后一行。

这一行后print(reg)reg2 = row[7]

就说明它已经读了整列。当我做后print(reg2)还印整列:reg2 = row[7]

但在if reg == reg2: 它只读取两列的最后一行,并比较他们,我不知道如何解决这个问题。

预先感谢您。

回答

1

我建议你先加载所有从Driver Details.csv细节到字典中,使用注册号为关键。那么这将让你轻松查找一个给定的条目,而不必继续再读取所有从该文件中的行:

import csv 

driver_details = {} 

with open('Driver Details.csv') as f_driver_details: 
    csv_driver_details = csv.reader(f_driver_details) 
    header = next(csv_driver_details)  # skip the header 

    for row in csv_driver_details: 
     driver_details[row[7]] = row 

with open('Standard Reg.csv') as f_standard_reg: 
    csv_standard_reg = csv.reader(f_standard_reg) 
    header = next(csv_standard_reg)  # skip the header 

    for row in csv_standard_reg: 
     try: 
      driver = driver_details[row[3]] 
      print('Match found - {} {}'.format(driver[0], driver[1])) 
     except KeyError as e: 
      print('No match found') 

的代码,你将通过file_2迭代并保留文件指针无论是在结束(如果找不到匹配)或者匹配的位置(可能会在之前的匹配中缺少下一个匹配项)。对于您的工作方法,您必须从每个循环的开始读取文件,这将非常缓慢。


要添加输出csv并显示完整的地址,你可以这样做以下:

import csv 

speed = 74.3 
fine = 35 

driver_details = {} 

with open('Driver Details.csv') as f_driver_details: 
    csv_driver_details = csv.reader(f_driver_details) 
    header = next(csv_driver_details)  # skip the header 

    for row in csv_driver_details: 
     driver_details[row[7]] = row 

with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output: 
    csv_standard_reg = csv.reader(f_standard_reg) 
    header = next(csv_standard_reg)  # skip the header 
    csv_output = csv.writer(f_output) 

    for row in csv_standard_reg: 
     try: 
      driver = driver_details[row[3]] 
      print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7]))) 
      csv_output.writerow(driver[0:7] + [speed, fine]) 
     except KeyError as e: 
      print('No match found') 

这将打印以下:

Match found - Fine 35, Speed 74.3 
Violet Kirby 
585-4073 Convallis Street 
Balfour 
Orkney 
United Kingdom 
OC1X 6QE 

并产生一个输出文件含:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35 
+0

谢谢你的帮助,但我不明白的一些部分1是头和它做什么和第二到最后一行它说'',因为它引发了一个语法错误,但那里没有 – Tilak

+0

'header ='行用于单独读取标题,如果你打印(标题),你会看到它的内容。 –

+0

我对“except”行进行了更改,因此您可以再试一次。 –

1

测试条件if reg == reg2出现在两个循环之外(对于file_1和file_2)。这就是为什么只用每个文件的最后一行完成测试的原因。

另一个问题是,在两个for循环中都使用相同的循环变量row

+0

我将file_2中的行更名为file_2中的row2,然后将if和else缩进到其中一个循环中,然后在第一个缩进中重复它只是重复没有找到匹配twise然后在第二个中重复它更多103次,因为驱动程序详细信息有101行在标准和标准有2),它没有找到匹配 – Tilak

0

尝试csv.DictReader消除大部分的行的代码:

import csv 
Violations = defaultdict(list) 

# Read in the violations, there are probably less violations than drivers (I hope!) 
with open('Standard reg.csv') as violations: 
    for v in csv.DictReader(violations): 
     Violations[v['Reg Plate']] = v 

with open('Driver Details.csv') as drivers: 
    for d in csv.DictReader(drivers): 
     fullname = "{driver.FirstName} {driver.LastName}".format(driver=d) 
     if d['Registration'] in Violations: 
      count = len(Violations[d['Registration']]) 
      print("{fullname} has {count} violations.".format(fullname=fullname, count=count)) 
     else: 
      print("{fullname} is too fast to catch!".format(fullname=fullname)) 
+0

我不会大写'违规',因为它是一个实例,而不是类名。 – pcurry