2012-12-05 39 views
3

嗨我想创建一个新的CSV文件,根据共同的列或主键合并两个CSV文件中的特定字段。我试过在PowerShell中做同样的事情,它的工作原理,但完成这个过程非常缓慢 - 超过30分钟用于合并5000多行文件,所以在Python中试用这个。我很新,所以请在我身上轻松一下。使用Python字典在Python中合并CSV文件

因此,两个文件是infile.csv和checkfile.csv,创建的输出文件中的列将基于infile.csv中的列。代码检查checkfile.csv中的值,创建outfile.csv,从infile.csv复制列,并需要根据checkfile.com中的相应值重写两个字段的值。以下是详细信息

infile.csv -

"StockNumber","SKU","ChannelProfileID","CostPrice" 
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33 
"10m_s-vid#CSE","2VV-10",3746,0.98 
"1RR-01#CSE","1RR-01",3746 
"1RR-01#PCAWS","1RR-01",3746, 
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42 

checkfile.csv

ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime 
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1 
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1 
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1 

我得到的outfile.csv是 -

StockNumber,SKU,ChannelProfileID,CostPrice 
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)", 
10m_s-vid#CSE,2VV-10,"(' ',)", 
1RR-01#CSE,1RR-01,"(' ',)", 
1RR-01#PCAWS,1RR-01,"(' ',)", 
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)", 

但outfile.csv我需要的是 -

StockNumber,SKU,ChannelProfileID,CostPrice 
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86 
10m_s-vid#CSE,2VV-10,1991,0.86 
1RR-01#CSE,1RR-01 
1RR-01#PCAWS,1RR-01   
1m_s-vid_ext#APTIIAMZ,2VV-101 

最后的代码 -

import csv 

with open('checkfile.csv', 'rb') as checkfile: 
    checkreader = csv.DictReader(checkfile) 

    product_result = dict(
     ((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice'])) for v in checkreader 
    ) 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 

     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 

     for item in reader: 
      result = product_result.get(item['SKU'], " ") 

      item['ChannelProfileID'] = result, 
      item['CostPrice'] = result 

      writer.writerow(item) 
+0

目前尚不清楚你的问题是什么。目前还不清楚预期结果应该是什么样子。 – pillmuncher

+0

另外,你的infile头文件定义了4个字段,但下面只有3个。 – pillmuncher

+0

好的,现在添加了期望的outfile.csv。正如你所看到的ChannelProfileID和CostPrice项目应该被填充,但它们不是。 – Anike

回答

3

你可以把它稍微简单:

import csv 

with open('checkfile.csv', 'rb') as checkfile: 
    product_result = { 
     record['ProductCode']: record for record in csv.DictReader(checkfile)} 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 
     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 
     for item in reader: 
      record = product_result.get(item['SKU'], None) 
      if record: 
       item['ChannelProfileID'] = record[' Stock'] # ??? 
       item['CostPrice'] = record[' CostPrice'] 
      else: 
       item['ChannelProfileID'] = None 
       item['CostPrice'] = None 
      writer.writerow(item) 

我不知道我与???注释行。

此外,如果您确实想要生成损坏的CSV,请随时省略else子句。

我用StringIO对象测试了它。它产生了你指定的结果,但是后面的逗号是checkfile中没有匹配的地方。

我用Python 2.7 dict理解,因为你用python-2.7标记了你的问题。

+0

谢谢!一旦我获得了足够的积分+1,我会! – Anike

1
import csv 

product_result = {} 

with open('checkfile.csv', 'rb') as checkfile: 
    checkreader = csv.DictReader(checkfile) 

    for v in checkreader: 
     product_result[v['ProductCode']] = (v[' Stock'], v[' CostPrice']) 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 
     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 

     for item in reader: 
      result = product_result.get(item['SKU']) 
      if result: 
       item['ChannelProfileID'], item['CostPrice'] = result 
      else: 
       item['ChannelProfileID'] = item['CostPrice'] = None 

      writer.writerow(item) 
+0

感谢您的回复 - 所以我将infile数据转换为元组。但是,如何将“股票”字段的字典值更新为ChannelProfileID,然后在outfile.csv中将值CostPrice更新为CostPrice? – Anike

+0

要继续,是否像项目['ChannelProfileID'] = result ['Stock']基本上试图将数据从Dictionary写入到特定的CSV字段 – Anike

+0

结果是一个元组,因此您只能使用整数作为其索引;我在这个例子中做的是序列拆包。 – Talvalin