2011-12-05 61 views
11

我有这样的代码解析JSON,并通过它搜索

import json 
from pprint import pprint 
json_data=open('bookmarks.json') 
jdata = json.load(json_data) 
pprint (jdata) 
json_data.close() 

我如何通过它搜索u'uri': u'http:

回答

15

由于json.loads简单地返回一个字典,你可以使用适用于类型的字典运营商:

>>> jdata = json.load('{"uri": "http:", "foo", "bar"}') 
>>> 'uri' in jdata  # Check if 'uri' is in jdata's keys 
True 
>>> jdata['uri']   # Will return the value belonging to the key 'uri' 
u'http:' 

编辑:给出关于如何通过数据环路的想法,考虑下面的例子:

>>> import json 
>>> jdata = json.loads(open ('bookmarks.json').read()) 
>>> for c in jdata['children'][0]['children']: 
...  print 'Title: {}, URI: {}'.format(c.get('title', 'No title'), 
              c.get('uri', 'No uri')) 
... 
Title: Recently Bookmarked, URI: place:folder=BOOKMARKS_MENU(...) 
Title: Recent Tags, URI: place:sort=14&type=6&maxResults=10&queryType=1 
Title: , URI: No uri 
Title: Mozilla Firefox, URI: No uri 

检查jdata数据结构将允许您根据需要导航它。您拨打pprint已经是一个很好的起点。

编辑2:另一次尝试。这会得到您在字典列表中提到的文件。有了这个,我认为你应该能够适应你的需求。

>>> def build_structure(data, d=[]): 
...  if 'children' in data: 
...   for c in data['children']: 
...    d.append({'title': c.get('title', 'No title'), 
...          'uri': c.get('uri', None)}) 
...    build_structure(c, d) 
...  return d 
... 
>>> pprint.pprint(build_structure(jdata)) 
[{'title': u'Bookmarks Menu', 'uri': None}, 
{'title': u'Recently Bookmarked', 
    'uri': u'place:folder=BOOKMARKS_MENU&folder=UNFILED_BOOKMARKS&(...)'}, 
{'title': u'Recent Tags', 
    'uri': u'place:sort=14&type=6&maxResults=10&queryType=1'}, 
{'title': u'', 'uri': None}, 
{'title': u'Mozilla Firefox', 'uri': None}, 
{'title': u'Help and Tutorials', 
    'uri': u'http://www.mozilla.com/en-US/firefox/help/'}, 
(...) 
}] 

要那么“通过它搜索u'uri': u'http:',做这样的事情:

for c in build_structure(jdata): 
    if c['uri'].startswith('http:'): 
     print 'Started with http' 
+0

它<说​​回溯(最近通话最后一个): 文件 “”,3号线,在 ValueError:当我尝试启动第二个示例时,格式为零的长度字段名称 – BKovac

+0

这可能与您导出的书签的布局有关......我不太了解格式,但我猜想它会为您书签中的每个文件夹或容器制作一个“儿童”键。例如,用'for c in jdata ['children']:'代替上述内容。另外,请注意''{}'。format()'函数在Python 2.6中是新的...您可能有一个旧版本。如果是这样,用'print'替换该行标题:%s,URI:%s'%(c.get('title','No title'),c.get('uri','No uri')) '。 – jro

+0

仍然不工作这里是书签文件http://pastebin.com/uCtECvDi – BKovac

0

您可以使用jsonpipe如果你只需要输出(和更舒适的命令行):

cat bookmarks.json | jsonpipe |grep uri 
+0

jsonpipe链接似乎被改变或删除 –

+0

@SureshPrajapati修复 – number5

3

ObjectPath是一个库,提供查询JSON和嵌套结构的能力o f字典和列表。例如,通过使用$..foo,您可以搜索名为“foo”的所有属性,而不管它们有多深。

尽管文档侧重于命令行界面,但您可以使用程序包的Python内部程序以编程方式执行查询。下面的例子假设您已经将数据加载到Python数据结构中(数据库&列表)。如果您以JSON文件或字符串开头,则只需首先使用json module中的loadloads

import objectpath 

data = [ 
    {'foo': 1, 'bar': 'a'}, 
    {'foo': 2, 'bar': 'b'}, 
    {'NoFooHere': 2, 'bar': 'c'}, 
    {'foo': 3, 'bar': 'd'}, 
] 

tree_obj = objectpath.Tree(data) 

tuple(tree_obj.execute('$..foo')) 
# returns: (1, 2, 3) 

请注意,它只是跳过缺乏“富”属性的元素,如列表中的第三项。你也可以做更复杂的查询,这使ObjectPath对于深层嵌套结构来说非常方便(例如,找到x有y的那个z:$.x.y.z)。有关详细信息,请参阅documentationtutorial

1

似乎Jro提供的JSON字典中存在拼写错误(缺少冒号)。

正确的语法是: jdata = json.load( '{ “URI”: “HTTP:”, “富”: “酒吧”}')

这清除它适合我玩的时候与代码。

0

函数来搜索和打印字符,如JSON。 在Python做* 3

搜索:

def pretty_search(dict_or_list, key_to_search, search_for_first_only=False): 
    """ 
    Give it a dict or a list of dicts and a dict key (to get values of), 
    it will search through it and all containing dicts and arrays 
    for all values of dict key you gave, and will return you set of them 
    unless you wont specify search_for_first_only=True 

    :param dict_or_list: 
    :param key_to_search: 
    :param search_for_first_only: 
    :return: 
    """ 
    search_result = set() 
    if isinstance(dict_or_list, dict): 
     for key in dict_or_list: 
      key_value = dict_or_list[key] 
      if key == key_to_search: 
       if search_for_first_only: 
        return key_value 
       else: 
        search_result.add(key_value) 
      if isinstance(key_value, dict) or isinstance(key_value, list) or isinstance(key_value, set): 
       _search_result = pretty_search(key_value, key_to_search, search_for_first_only) 
       if _search_result and search_for_first_only: 
        return _search_result 
       elif _search_result: 
        for result in _search_result: 
         search_result.add(result) 
    elif isinstance(dict_or_list, list) or isinstance(dict_or_list, set): 
     for element in dict_or_list: 
      if isinstance(element, list) or isinstance(element, set) or isinstance(element, dict): 
       _search_result = pretty_search(element, key_to_search, search_result) 
       if _search_result and search_for_first_only: 
        return _search_result 
       elif _search_result: 
        for result in _search_result: 
         search_result.add(result) 
    return search_result if search_result else None 

打印:

def pretty_print(dict_or_list, print_spaces=0): 
    """ 
    Give it a dict key (to get values of), 
    it will return you a pretty for print version 
    of a dict or a list of dicts you gave. 

    :param dict_or_list: 
    :param print_spaces: 
    :return: 
    """ 
    pretty_text = "" 
    if isinstance(dict_or_list, dict): 
     for key in dict_or_list: 
      key_value = dict_or_list[key] 
      if isinstance(key_value, dict): 
       key_value = pretty_print(key_value, print_spaces + 1) 
       pretty_text += "\t" * print_spaces + "{}:\n{}\n".format(key, key_value) 
      elif isinstance(key_value, list) or isinstance(key_value, set): 
       pretty_text += "\t" * print_spaces + "{}:\n".format(key) 
       for element in key_value: 
        if isinstance(element, dict) or isinstance(element, list) or isinstance(element, set): 
         pretty_text += pretty_print(element, print_spaces + 1) 
        else: 
         pretty_text += "\t" * (print_spaces + 1) + "{}\n".format(element) 
      else: 
       pretty_text += "\t" * print_spaces + "{}: {}\n".format(key, key_value) 
    elif isinstance(dict_or_list, list) or isinstance(dict_or_list, set): 
     for element in dict_or_list: 
      if isinstance(element, dict) or isinstance(element, list) or isinstance(element, set): 
       pretty_text += pretty_print(element, print_spaces + 1) 
      else: 
       pretty_text += "\t" * print_spaces + "{}\n".format(element) 
    else: 
     pretty_text += str(dict_or_list) 
    if print_spaces == 0: 
     print(pretty_text) 
    return pretty_text