2017-02-22 61 views
0

我对我希望我的Python字典列表看起来像有什么想法,但在将电子表格数据拉入数据结构时遇到问题。我遇到的问题是,一行可能有数据来填充父字典值以及一个孩子。对于后续行,如果父级列的值为空,则假定子级的列属于上一级父级。如果我们遇到父数据不为空的新行,请将其视为要添加到列表中的新父项。使用电子表格数据填充嵌套字典

这是电子表格会是什么样子的例子:

+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| name   | descr    | adminSt | authSt | server_hostname_ip | server_descr | server_preferred | server_EPG | server_minPol | server_maxPoll | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| test1-NTPPOL | Test NTP Policy | enabled | disabled | 10.10.10.10  | NTP1 server | yes    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 10.10.10.11  | NTP2 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 10.10.10.12  | NTP3 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| test2-NTPPOL | Test 2 NTP policy | enabled | disabled | 20.10.10.10  | NTP1 server | yes    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 20.10.10.11  | NTP2 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 20.10.10.12  | NTP3 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 

我想数据结构是这样的:我来这个样子

[ 
    { 
    "name": "NTP_Policy1", 
    "descr": "NTP Policy 1", 
    "adminSt": "enabled", 
    "authSt": "disabled", 
    "servers": [ 
     { 
     "hostname": "10.10.10.10", 
     "descr": "NTP1 Server", 
     "preferred": true, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     }, 
     { 
     "hostname": "20.10.10.10", 
     "descr": "NTP2 Server", 
     "preferred": false, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     } 
    ] 
    }, 
    { 
    "name": "NTP_Policy2", 
    "descr": "NTP Policy 2", 
    "adminSt": "enabled", 
    "authSt": "disabled", 
    "servers": [ 
     { 
     "hostname": "30.10.10.10", 
     "descr": "NTP3 Server", 
     "preferred": true, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     }, 
     { 
     "hostname": "40.10.10.10", 
     "descr": "NTP4 Server", 
     "preferred": false, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     } 
    ] 
    } 
] 

最接近的代码,但是后续行将子级附加到父级别。

>>> import pyexcel 
>>> from pprint import pprint 
>>> def excel_to_dict(sheet): 
...  rows = sheet.iter_rows() 
...  keys = next(rows) 
...  dict_list = [] 
...  # For each row in the spreadsheet, 
...  # Create an iterator pair so that the key is iterated over at the same time as its matching cell in the row 
...  # Then save that pairing as descriptors of the switch 
...  for row in rows: 
...   dict = {} 
...   dict['servers'] = [] 
...   server_atts = {} 
...   for key,cell in zip(keys, row): 
...    if str(cell.value) != 'None' and str(key.value) == 'name': 
...     dict[str(key.value)] = str(cell.value) 
...     parentKey = str(key.value) 
...    elif (str(cell.value) != 'None' and str(key.value) == 'descr') or (str(cell.value) != 'None' and str(key.value) == 'adminSt') or (str(cell.value) != 'None' and str(key.value) == 'authSt'): 
...     dict[str(key.value)] = str(cell.value) 
...    elif str(cell.value) == 'None': 
...     continue 
...    else: 
...     server_atts[str(key.value)] = str(cell.value) 
...   dict['servers'].append(server_atts.copy()) 
...   dict_list.append(dict.copy()) 
...  return dict_list 
>>> wb = openpyxl.load_workbook('aci_config.xlsx') 
>>> ntpPolsSheet = wb.get_sheet_by_name('ntp_pol') 
>>> ntpPols = excel_to_dict(ntpPolsSheet) 
>>> 
>>> pprint(ntpPols) 
[{'adminSt': 'enabled', 
    'authSt': 'disabled', 
    'descr': 'Test NTP Policy', 
    'name': 'test1-NTPPOL', 
    'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP1 server', 
       'server_hostname_ip': '10.10.10.10', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'yes'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP2 server', 
       'server_hostname_ip': '10.10.10.11', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP3 server', 
       'server_hostname_ip': '10.10.10.12', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'adminSt': 'enabled', 
    'authSt': 'disabled', 
    'descr': 'Test 2 NTP policy', 
    'name': 'test2-NTPPOL', 
    'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP1 server', 
       'server_hostname_ip': '20.10.10.10', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'yes'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP2 server', 
       'server_hostname_ip': '20.10.10.11', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP3 server', 
       'server_hostname_ip': '20.10.10.12', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}] 

什么代码需要看起来像正确填充字典清单?是否有更好的电子表格格式可以更容易地导入数据?我正尝试在一张纸上完成所有操作,而不是多张纸。

+1

你可不可以为此使用'pandas'吗?它只需几行代码即可达到相同的结果。 –

+0

你应该把它转换成'json' –

+0

你遇到的问题是什么?数据是否按照您的预期进入? – aydow

回答

0

我建议将.xlsx文件保存为csv格式,因为它必须更容易处理。它会看起来像这样的文字形式:

name,descr,adminSt,authSt,server_hostname_ip,server_descr,server_preferred,server_EPG,server_minPoll, 
test1-NTPPOL,Test NTP Policy,enabled,disabled,10.10.10.10,NTP1 server,yes,oob-default,4,6 
,,,,10.10.10.11,NTP2 server,no,oob-default,4,6 
,,,,10.10.10.12,NTP3 server,no,oob-default,4,6 
test2-NTPPOL,Test 2 NTP policy,enabled,disabled,20.10.10.10,NTP1 server,yes,oob-default,4,6 
,,,,20.10.10.11,NTP2 server,no,oob-default,4,6 
,,,,20.10.10.12,NTP3 server,no,oob-default,4,6 

然后,您可以使用熊猫阅读csv并将其转换为json。熊猫有一个.iloc函数,它允许你先按行索引,再按列名索引。

import pandas as pd 
from beeprint import pp 

def excel_to_dict(sheet): 
    dict_list = [] 
    last_test_dict = None 
    for i in xrange(len(sheet)): 
     # When we find a new row with a name value, we want to insert 
     # the old test_dict into the dict_list and make a new test_dict. 
     # Also, we want to skip the first row to not append an empty dict. 
     if pd.notnull(sheet.iloc[i]['name']): 
      if i != 0: 
       dict_list.append(test_dict) 
      test_dict = {} 
      test_dict['name'] = sheet.iloc[i]['name'] 
      test_dict['descr'] = sheet.iloc[i]['descr'] 
      test_dict['adminSt'] = sheet.iloc[i]['adminSt'] 
      test_dict['authSt'] = sheet.iloc[i]['authSt'] 
      test_dict['servers'] = [] 
      server_info = {} 
      server_info['server_hostname'] = sheet.iloc[i]['server_hostname_ip'] 
      server_info['server_descr'] = sheet.iloc[i]['server_descr'] 
      server_info['server_preferred'] = sheet.iloc[i]['server_preferred'] 
      server_info['server_EPG'] = sheet.iloc[i]['server_EPG'] 
      server_info['minPoll'] = sheet.iloc[i]['server_minPoll'] 
      server_info['maxPoll'] = sheet.iloc[i]['server_maxPoll'] 
      test_dict['servers'].append(server_info) 
      last_test_dict = test_dict # keep a handle to our new dict 
     else: 
      # Use the handle to the last test dict created to add info 
      # about a new server without modifying the name of the test 
      server_info = {} 
      server_info['server_hostname'] = sheet.iloc[i]['server_hostname_ip'] 
      server_info['server_descr'] = sheet.iloc[i]['server_descr'] 
      server_info['server_preferred'] = sheet.iloc[i]['server_preferred'] 
      server_info['server_EPG'] = sheet.iloc[i]['server_EPG'] 
      server_info['minPoll'] = sheet.iloc[i]['server_minPoll'] 
      server_info['maxPoll'] = sheet.iloc[i]['server_maxPoll'] 
      last_test_dict['servers'].append(server_info) 

    # In case we didn't enter the last test dict into the list 
    dict_list.append(last_test_dict) 
    return dict_list 

sheet = pd.read_csv('sheet.csv', sep=',') 
pp(excel_to_dict(sheet)) 
+0

这看起来很完美。我唯一的问题是'如果我!= 0:然后dict_list.append(test_dict)'。这似乎意味着只要我们不在第一行,也不在具有空白名称的行上,然后将test_dict附加到我们的主dict列表中。上次我们通过这部分代码时,我们是不是追加了test_dict中的任何内容?在此之后,我们按预期从该行抓取数据。我只是不明白为什么你需要追加test_dict,如果我们不在索引0和新的名称条目。 – mikey

+0

正确的做法是,在开始的时候,每当我们处理一个新行时,_does_包含一个新的测试名称,我们会将之前的'test_dict'附加到'dict_list'中。但是,当我们到达第1行时,以前的'test_dict'将会是空的,我们会尝试将一个空字典插入'dict_list'。所以我确保不要在i == 0时这样做。这也意味着最终的'test_dict'不会进入for循环中的'dict_list',所以我在return语句之前添加了行。 – Chirag

+0

我现在明白了。有什么办法来推广这个功能吗?我有其他工作表,不同的列将受益于这个功能,除了它被硬编码为静态布局,我需要为每个工作表编写不同的代码。将数据结构从最后一列构建到第一个是否是最简单的方法?我问,因为我有一个表与这些标题:网站,建筑物,地板,房间,行,机架。一个网站有一个名称可以有多个建筑物,一个建筑物有一个名称和多个楼层,以此类推。 – mikey