使用Python将单行.dat文件合并到一个.csv文件中

我是编程世界的初学者，想了解如何解决一个挑战的一些提示。现在我有10〜000的.dat具有以下这种结构单行文件中的每个：使用Python将单行.dat文件合并到一个.csv文件中

Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value

我一直在尝试使用Python和CSV库，这些.dat文件转换成一个单一的.csv文件。

到目前为止，我能够编写一些能够读取所有文件的内容，将每个文件的内容存储在一个新行中，并将“&”替换为“，”但由于Attribute1，Attribute2 ... AttributeN正好对于每个文件都是一样的，我想将它们放入列标题中并将它们从其他所有行中删除。

关于如何去做的任何提示？

谢谢！

来源

2015-10-31 brenogil

既然你是初学者，我准备了一些可行的代码，同时也很容易理解。

我假设你已经把文件夹中的所有文件都称为“输入”。下面的代码应该位于文件夹旁边的脚本文件中。

请记住，应该使用此代码来了解如何解决这样的问题。优化和完整性检查已被故意排除。

您可能需要额外检查什么，当值缺少一些线情况，如果属性缺失，有损坏的输入等会发生什么.. :)

好运会发生什么！

import os 

# this function splits the attribute=value into two lists 
# the first list are all the attributes 
# the second list are all the values 
def getAttributesAndValues(line): 
    attributes = [] 
    values = [] 

    # first we split the input over the & 
    AtributeValues = line.split('&') 
    for attrVal in AtributeValues: 
     # we split the attribute=value over the '=' sign 
     # the left part goes to split[0], the value goes to split[1] 
     split = attrVal.split('=') 
     attributes.append(split[0]) 
     values.append(split[1]) 

    # return the attributes list and values list 
    return attributes,values 

# test the function using the line beneath so you understand how it works 
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value" 
# print getAttributesAndValues(line) 

# this function writes a single file to an output file 
def writeToCsv(inFile='', wfile="outFile.csv", delim=","): 
    f_in = open(inFile, 'r') # only reading the file 
    f_out = open(wfile, 'ab+') # file is opened for reading and appending 

    # read the whole file line by line 
    lines = f_in.readlines() 

    # loop throug evert line in the file and write its values 
    for line in lines: 
     # let's check if the file is empty and write the headers then 
     first_char = f_out.read(1) 
     header, values = getAttributesAndValues(line) 

     # we write the header only if the file is empty 
     if not first_char: 
      for attribute in header: 
       f_out.write(attribute+delim) 
      f_out.write("\n") 

     # we write the values 
     for value in values: 
      f_out.write(value+delim) 
     f_out.write("\n") 

# Read all the files in the path (without dir pointer) 
allInputFiles = os.listdir('input/') 
allInputFiles = allInputFiles[1:] 

# loop through all the files and write values to the csv file 
for singleFile in allInputFiles: 
    writeToCsv('input/'+singleFile)

来源

2015-10-31 17:24:11 afabijan

非常感谢！正如你打算的那样，这段代码帮助我解决了我的问题，并给了我一点东西来学习。 – brenogil

欢迎您！ – afabijan

将dat文件放入名为myDats的文件夹中。将此脚本放在myDats文件夹旁边，并附带一个名为temp.txt的文件。您还需要您的output.csv。 [也就是说，你将有output.csv，myDats，并mergeDats.py在同一文件夹]

mergeDats.py

import csv 
import os 
g = open("temp.txt","w") 
for file in os.listdir('myDats'): 
    f = open("myDats/"+file,"r") 
    tempData = f.readlines()[0] 
    tempData = tempData.replace("&","\n") 
    g.write(tempData) 
    f.close() 
g.close() 
h = open("text.txt","r") 
arr = h.read().split("\n") 
dict = {} 
for x in arr: 
    temp2 = x.split("=") 
    dict[temp2[0]] = temp2[1] 
with open('output.csv','w' """use 'wb' in python 2.x""") as output: 
    w = csv.DictWriter(output,my_dict.keys()) 
    w.writeheader() 
    w.writerow(my_dict)

来源

2015-10-31 16:33:20 AMACB

谢谢！运行这个，我得到： 'IOError：[Errno 2]没有这样的文件或目录：'1.dat'' – brenogil

应该修复它，再试一次 – AMACB

but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.

input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'

一次的第一个文件：

','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))

对于每个文件的内容：

','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))

也许你需要修剪字符串，不知道你的输入有多清洁。

来源

2015-10-31 16:40:05

好吧，这是一个有趣的方法！我会试试看，让你知道会发生什么。谢谢！ – brenogil

使用Python将单行.dat文件合并到一个.csv文件中

回答

相关问题