2015-10-13 32 views
4

我有一个名为“triple_response.txt”包含了一些文本为文本文件:使用python脚本迭代一个文本文件的蟒蛇内容

(1,(db_name,string),DSP) 
(1,(rel, id),2) 
(2,(rel_name, string),DataSource) 
(2,(tuple, id),201) 
(2,(tuple, id),202) 
(2,(tuple, id),203) 
(201,(src_id,varchar),Pos201510070) 
(201,(src_name,varchar),Postgres) 
(201,(password,varchar),root) 
(201,(host,varchar),localhost) 
(201,(created_date,date),2015-10-07) 
(201,(user_name,varchar),postgres) 
(201,(src_type,varchar),Structured) 
(201,(db_name,varchar),postgres) 
(201,(port,numeric),None) 
(202,(src_id,varchar),pos201510060) 
(202,(src_name,varchar),Postgres) 
(202,(password,varchar),root) 
(202,(host,varchar),localhost) 
(202,(created_date,date),2015-10-06) 
(202,(user_name,varchar),postgres) 
(202,(src_type,varchar),Structured) 
(202,(db_name,varchar),DSP) 
(202,(port,numeric),5432) 
(203,(src_id,varchar),pos201510060) 
(203,(src_name,varchar),Postgres) 
(203,(password,varchar),root) 
(203,(host,varchar),localhost) 
(203,(created_date,date),2015-10-06) 
(203,(user_name,varchar),postgres) 
(203,(src_type,varchar),Structured) 
(203,(db_name,varchar),maindb) 
(203,(port,numeric),5432) 

我想这些内容转换成JSON :

import re 
import collections 
import json, jsonpickle 


def convertToJSON(File): 
    word_list=[] 
    row_list = [] 
    try: 
     with open(File,'r') as f: 
      for word in f: 
       word_list.append(word) 


     with open(File,'r+') as f: 
      for row in f: 
       print row 
       row_list.append(row.split()) 

     column_list = zip(*row_list) 
    except IOError: 
     print "Error in opening file.." 
    triple ="" 
    for t in word_list: 
     triple+=t 

    tripleList = re.findall(r"\([^\(^\)]*\)",triple) 
    idList = re.split(r"\([^\(^\)]*\)",triple) 

    i =0 
    jsonDummy = [] 
    jsonData = {} 
    for trip in tripleList: 
     nameAndType = re.split(r",|:",trip) 

     if(i==0): 
       key = re.compile("[^\w']|_").sub("",idList[i]) 
     else: 
      try: 
       key = re.compile("[^\w']|_").sub("",idList[i].split("(")[1]) 
      except IndexError: 
       pass 
     i = i+1 
     if(idList[i].find('(')!=-1): 
      try: 
       content = re.compile("[^\w']|_").sub("",idList[i].split(")")[0]) 

      except IndexError: 
       pass 
     else: 
      content = re.compile("[^\w']|_").sub("",idList[i]) 
     try: 
      trip = trip[1:-1] 
      tripKey = trip[1] 

     except IndexError: 
      tripKey = '' 
     name = re.compile("[^\w']").sub("",nameAndType[0]) 
     try: 
      typeName = re.compile("[^\w']|_").sub("",nameAndType[1]) 
     except IndexError: 
      typeName = 'String' 

     tripDict = dict() 
     value = dict() 

     value[name] = content 
     tripDict[key]=value 

     jsonDummy.append(tripDict) 

    for j in jsonDummy: 
     for k,v in j.iteritems(): 
      jsonData.setdefault(k, []).append(v) 

    data = dict() 
    data['data'] = jsonData 
    obj = {} 
    obj=jsonpickle.encode(data, unpicklable=False) 

    return obj 

    pass 

我在同一文件中调用这个函数convertToJSON()为:

打印convertToJSON( “triple_response.txt”)

我得到的输出如我所料,如:

{"data": {"1": [{"db_name": "DSP"}, {"rel": "2"}], "201": [{"src_id": "Pos201510070"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151007"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "postgres"}, {"port": "None"}], "203": [{"src_id": "pos201510060"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151006"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "maindb"}, {"port": "5432"}], "2": [{"rel_name": "DataSource"}, {"tuple": "201"}, {"tuple": "202"}, {"tuple": "203"}], "202": [{"src_id": "pos201510060"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151006"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "DSP"}, {"port": "5432"}]}} 

现在,这是我现在所面临的问题,我是从类的外部调用此为:

def extractConvertData(self): 
     triple_response = SPO(source, db_name, table_name, response) 
     try: 
      _triple_file = open('triple_response.txt','w+') 
      _triple_file.write(triple_response) 
      print "written data in file.." 
      with open('triple_response.txt','r+') as f: 
       for word in f: 
        print word 
      jsonData = convertToJSON(str('triple_response.txt')) 
     except IOError: 
      print "Not able to open a file" 
     print "Converted into JSON" 
     print jsonData 
     pass 

相同的代码convertToJSON()不起作用...

既没有给出任何输出也没有给出任何错误,它无法读取行中'triple_response.txt'文件的内容。

with open('triple_response.txt','r+') as f: 
    for word in f: 
     print word 

任何人能告诉我解决这个问题..

+4

“从类的外部调用这个?”我没有看到任何类的定义。 –

+2

包含'extractConvertData'的脚本与'triple_response.txt'是否存在于同一个目录中? –

+2

你的文件找不到,因为你使用相对路径来解决它 - 这是我相对于绝对路径的标准答案:http://stackoverflow.com/questions/30621233/python-configparser-cannot-search-ini-file -correctly-ubuntu-14-python-3-4/30625670#30625670。 –

回答

2

_triple_file永远不会关闭(除了隐含当您结束Python的过程,这是一个可怕的做法)。

当你像这样悬挂文件句柄时,你可以获得特定于平台的行为(什么是你的平台?Unix?Windows?)。可能写入_triple_file不会被刷新。 所以不要让它摇晃。确保在写入后关闭它:(_triple_file.write(triple_response))。事实上,然后断言文件长度不为零,使用os.stat(),否则引发异常。

此外,你只有一个大的尝试...除了子句捕捉所有的错误,这是一口咬了太多。将它分成两个单独的尝试...除了编写_triple_file的条款,然后再读回。 (顺便说一句,你可能喜欢使用tempfile库,以避开需要知道你的中间文件的路径名)。

类似以下内容未经测试的伪代码:

triple_response = SPO(source, db_name, table_name, response) 
    try: 
     _triple_file = open('triple_response.txt','w+') 
     _triple_file.write(triple_response) 
     _triple_file.close() 
    except IOError: 
     print "Not able to write intermediate JSON file" 
     raise 

    assert [suitable expression involving os.stat('triple_response.txt') to test size > 0 ], "Error: intermediate JSON file was empty" 

    try: 
     with open('triple_response.txt','r+') as f: 
      for word in f: 
       print word 
     jsonData = convertToJSON(str('triple_response.txt')) 
    except IOError: 
     print "Not able to read back intermediate JSON file" 
     #raise # if you want to reraise the exception 

    ... 
+1

非常感谢smci ..... –