数百兆字节没有那么多。为什么使用不是去一个简单的方法的csv
module和collections.defaultdict
:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
产生的defaultdict
看起来是这样的:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
然后,您可以合并到这一个CSV文件(不是我最漂亮的工作,但好够了):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv
则包含
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
为什么有人会downvote呢??? – Volatil3
也许是因为它表明你缺乏研究工作?不过,这不是我。 –
我认为这是一个重复的问题。在开新问题之前,你应该总是搜索它。顺便说一句,这不是我!http://stackoverflow.com/questions/17586573/python-combing-data-from-different-csv-files-into-one/17588521#17588521 –