2013-07-25 59 views
7

基本上,我想遍历一个文件,并将每行的内容放入一个深层嵌套的字典中,其结构由多余的空格在每一行的开始。从python中的缩进文本文件创建树/深层嵌套字典

本质上的目的是采取这样的:

a 
    b 
     c 
    d 
     e 

,并把它弄成这个样子:

{"a":{"b":"c","d":"e"}} 

或者这样:

apple 
    colours 
     red 
     yellow 
     green 
    type 
     granny smith 
    price 
     0.10 

到这一点:

{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10} 

这样我就可以将它发送到Python的JSON模块并制作一些JSON。

目前我正在试图让一个字典和像这样的步骤列表:

  1. {"a":""} ["a"]
  2. {"a":"b"} ["a"]
  3. {"a":{"b":"c"}} ["a","b"]
  4. {"a":{"b":{"c":"d"}}}} ["a","b","c"]
  5. {"a":{"b":{"c":"d"},"e":""}} ["a","e"]
  6. {"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
  7. {"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]

等等

名单就像“面包屑”表示在我上次放在一个字典。

要做到这一点,我需要一种方法来遍历列表并生成类似dict["a"]["e"]["f"]的东西来得到最后一个字典。我有一个看看类自动激活,有人做出了看起来非常有用但我真的不确定的:

  1. 无论我使用这个正确的数据结构(我打算送它到JSON库来创建一个JSON对象)
  2. 如何在这种情况下使用自动授权
  3. 是否有更好的方法来解决这个问题。

我想出了下面的功能,但它不工作:

def get_nested(dict,array,i): 
if i != None: 
    i += 1 
    if array[i] in dict: 
     return get_nested(dict[array[i]],array) 
    else: 
     return dict 
else: 
    i = 0 
    return get_nested(dict[array[i]],array) 

将不胜感激帮助!

(我非常不完整的代码的其余部分是在这里:)

#Import relevant libraries 
import codecs 
import sys 

#Functions 
def stripped(str): 
    if tab_spaced: 
     return str.lstrip('\t').rstrip('\n\r') 
    else: 
     return str.lstrip().rstrip('\n\r') 

def current_ws(): 
    if whitespacing == 0 or not tab_spaced: 
     return len(line) - len(line.lstrip()) 
    if tab_spaced: 
     return len(line) - len(line.lstrip('\t\n\r')) 

def get_nested(adict,anarray,i): 
    if i != None: 
     i += 1 
     if anarray[i] in adict: 
      return get_nested(adict[anarray[i]],anarray) 
     else: 
      return adict 
    else: 
     i = 0 
     return get_nested(adict[anarray[i]],anarray) 

#initialise variables 
jsondict = {} 
unclosed_tags = [] 
debug = [] 

vividfilename = 'simple.vivid' 
# vividfilename = sys.argv[1] 
if len(sys.argv)>2: 
    jsfilename = sys.argv[2] 
else: 
    jsfilename = vividfilename.split('.')[0] + '.json' 

whitespacing = 0 
whitespace_array = [0,0] 
tab_spaced = False 

#open the file 
with codecs.open(vividfilename,'rU', "utf-8-sig") as vividfile: 
    for line in vividfile: 
     #work out how many whitespaces at start 
     whitespace_array.append(current_ws()) 

     #For first line with whitespace, work out the whitespacing (eg tab vs 4-space) 
     if whitespacing == 0 and whitespace_array[-1] > 0: 
      whitespacing = whitespace_array[-1] 
      if line[0] == '\t': 
       tab_spaced = True 

     #strip out whitespace at start and end 
     stripped_line = stripped(line) 

     if whitespace_array[-1] == 0: 
      jsondict[stripped_line] = "" 
      unclosed_tags.append(stripped_line) 

     if whitespace_array[-2] < whitespace_array[-1]: 
      oldnested = get_nested(jsondict,whitespace_array,None) 
      print oldnested 
      # jsondict.pop(unclosed_tags[-1]) 
      # jsondict[unclosed_tags[-1]]={stripped_line:""} 
      # unclosed_tags.append(stripped_line) 

     print jsondict 
     print unclosed_tags 

print jsondict 
print unclosed_tags 
+4

我不得不引用[的Python禅](http://www.python.org/dev/peps/pep-0020/)“扁平比嵌套更好“。我会改变你如何做这件事。总比嵌套字典有更好的方法。此外,请确保你没有陷入[X Y问题](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)。 –

+0

我最初的做法很简单,就是使用各种规则生成一个很长的字符串。那会更好吗? – Tomcat

+1

这取决于你想要达到的目标,看看[XY问题](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),并确保你是不会犯类似的错误。本质上,你需要弄清楚你的数据是什么,并围绕它建立你的容器,而不是建立一个容器,并找出如何把你的数据放入它。每种类型的容器都有其优点,但使用字符串来存储不同的数据集从来都不是一个好主意。 –

回答

4

这里是一个递归解决方案。首先,按以下方式转换输入。

输入:

person: 
    address: 
     street1: 123 Bar St 
     street2: 
     city: Madison 
     state: WI 
     zip: 55555 
    web: 
     email: [email protected] 

第一步输出:

[{'name':'person','value':'','level':0}, 
{'name':'address','value':'','level':1}, 
{'name':'street1','value':'123 Bar St','level':2}, 
{'name':'street2','value':'','level':2}, 
{'name':'city','value':'Madison','level':2}, 
{'name':'state','value':'WI','level':2}, 
{'name':'zip','value':55555,'level':2}, 
{'name':'web','value':'','level':1}, 
{'name':'email','value':'[email protected]','level':2}] 

这是很容易与split(':')和通过计数前导制表符的数量来完成:

def tab_level(astr): 
    """Count number of leading tabs in a string 
    """ 
    return len(astr)- len(astr.lstrip('\t')) 

再喂第一步输出成以下功能:

def ttree_to_json(ttree,level=0): 
    result = {} 
    for i in range(0,len(ttree)): 
     cn = ttree[i] 
     try: 
      nn = ttree[i+1] 
     except: 
      nn = {'level':-1} 

     # Edge cases 
     if cn['level']>level: 
      continue 
     if cn['level']<level: 
      return result 

     # Recursion 
     if nn['level']==level: 
      dict_insert_or_append(result,cn['name'],cn['value']) 
     elif nn['level']>level: 
      rr = ttree_to_json(ttree[i+1:], level=nn['level']) 
      dict_insert_or_append(result,cn['name'],rr) 
     else: 
      dict_insert_or_append(result,cn['name'],cn['value']) 
      return result 
    return result 

其中:

def dict_insert_or_append(adict,key,val): 
    """Insert a value in dict at key if one does not exist 
    Otherwise, convert value to list and append 
    """ 
    if key in adict: 
     if type(adict[key]) != list: 
      adict[key] = [adict[key]] 
     adict[key].append(val) 
    else: 
     adict[key] = val 
+1

你可以提供代码来翻译输入到'第一步输出'? 谢谢。 –

+0

如果有人感兴趣,我创建了一个类似的[C#实现](http://stackoverflow.com/a/36998605/107625)。 –

+0

这是[高度相关的问题](http://stackoverflow.com/questions/38664465/creating-a-tree-deeply-nested-dict-with-lists-from-an-indented-text-file)。有什么机会可以帮忙? – zelusp

0

首先,不使用arraydict作为变量名,因为它们是在Python保留字和重用他们可能结束各种各样的混乱。

好的,如果我正确地得到了你,你在文本文件中给出了一棵树,父母身份由缩进表示,并且你想恢复实际的树结构。对?

以下看起来像一个有效的大纲?因为我无法将当前的代码放入上下文中。

result = {} 
last_indentation = 0 
for l in f.xreadlines(): 
    (c, i) = parse(l) # create parse to return character and indentation 
    if i==last_indentation: 
    # sibling to last 
    elif i>last_indentation: 
    # child to last 
    else: 
    # end of children, back to a higher level 

OK,然后你的列表是当前的父母,这是其实正确的 - 但我让他们指出你所创建的字典,而不是字面信

刚开始有些东西在这里

result = {} 
parents = {} 
last_indentation = 1 # start with 1 so 0 is the root of tree 
parents[0] = result 
for l in f.xreadlines(): 
    (c, i) = parse(l) # create parse to return character and indentation 
    if i==last_indentation: 
     new_el = {} 
     parents[i-1][c] = new_el 
     parents[i] = new_el 
    elif i>last_indentation: 
    # child to last 
    else: 
    # end of children, back to a higher level 
+0

是的,这是完全正确的。 – Tomcat

+0

好的,然后让我添加一些东西... – Nicolas78

+0

谢谢!如果json.dumps采取了不是字典的格式,我会更快乐:P – Tomcat

2

以下代码将采用块缩进文件并转换为XML树;这样的:

foo 
bar 
baz 
    ban 
    bal 

...变为:

<cmd>foo</cmd> 
<cmd>bar</cmd> 
<block> 
    <name>baz</name> 
    <cmd>ban</cmd> 
    <cmd>bal</cmd> 
</block> 

的基本方法是:

  1. 设置缩进0
  2. 对于每一行,得到缩进
  3. 如果>当前,降低并将当前块/标识保存在堆栈上
  4. 如果==电流,附加到当前块
  5. 如果<目前,流行从堆栈,直到你得到匹配的缩进

所以:

from lxml import builder 
C = builder.ElementMaker() 

def indent(line): 
    strip = line.lstrip() 
    return len(line) - len(strip), strip 

def parse_blockcfg(data): 
    top = current_block = C.config() 
    stack = [] 
    current_indent = 0 

    lines = data.split('\n') 
    while lines: 
     line = lines.pop(0) 
     i, line = indent(line) 

     if i==current_indent: 
      pass 

     elif i > current_indent: 
      # we've gone down a level, convert the <cmd> to a block 
      # and then save the current ident and block to the stack 
      prev.tag = 'block' 
      prev.append(C.name(prev.text)) 
      prev.text = None 
      stack.insert(0, (current_indent, current_block)) 
      current_indent = i 
      current_block = prev 

     elif i < current_indent: 
      # we've gone up one or more levels, pop the stack 
      # until we find out which level and return to it 
      found = False 
      while stack: 
       parent_indent, parent_block = stack.pop(0) 
       if parent_indent==i: 
        found = True 
        break 
      if not found: 
       raise Exception('indent not found in parent stack') 
      current_indent = i 
      current_block = parent_block 

     prev = C.cmd(line) 
     current_block.append(prev) 

    return top