2017-07-28 229 views
-1

我想使用Pandas的json_normalize,但到目前为止,我的努力只产生了错误。有人能告诉我我做错了什么吗?我有一个复杂的嵌套JSON,我很乐意利用熊猫的强大工具来分析它。如何使用熊猫的json_normalize

代码(当前的尝试):

import json, pandas as pd 

from pandas.io.json import json_normalize 

df = pd.read_json('dir/data.json') 

json_normalize(df,'aaa', 'bbb') 

的错误已介于

TypeError: string indices must be integers 

多个KeyError: 0问题。我尝试了多个关键字参数来处理使用这个函数,我试图将数据分解成行并在规范化之前逐行重新创建它,并且我读取了documentation for this function并将错误的函数与错误组合在一起,得到。所有都失败了。我怀疑这可能是由于data.json的性质相当复杂。我可以使用其他方法,但它们非常耗时。

关于格式化的道歉,这是我第一个问题。对于谁与建设性的反馈意见作出回应的真棒人,这里是从我的数据文件的中间采取了几行字:

{"_id" : { "$oid" : "52b213b38594d8a2be17c789" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-29T00:00:00Z", "borrower" : "THE KINGDOM OF MOROCCO", "closingdate" : "2014-12-31T00:00:00Z", "country_namecode" : "Kingdom of Morocco!$!MA", "countrycode" : "MA", "countryname" : "Kingdom of Morocco", "countryshortname" : "Morocco", "docty" : "Program Document,Project Information Document,Project Information Document", "grantamt" : 0, "ibrdcommamt" : 200000000, "id" : "P130903", "idacommamt" : 0, "impagency" : "MINISTRY OF FINANCE", "lendinginstr" : "Development Policy Lending", "lendinginstrtype" : "AD", "lendprojectcost" : 200000000, "majorsector_percent" : [ { "Name" : "Public Administration, Law, and Justice", "Percent" : 34 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 } ], "mjsector_namecode" : [ { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" } ], "mjtheme" : [ "Public sector governance", "Public sector governance", "Public sector governance" ], "mjtheme_namecode" : [ { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" } ], "mjthemecode" : "2,2,2", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of this First Transparency and Accountability Development Policy Loan (DPL) Program for Morocco is to support the concretization of key new constitutional governance principles and rights, aimed at increasing transparency and accountability and enhancing citizen engagement and access to information. The series supports structural reforms strengthening economic governance across the public sector and new policies fostering more inclusive and open governance. The DPL has been prepared jointly with the European Union (EU) and the African Development Bank (AfDB), leveraging a further US$ 250 million in support of common key policy actions such as the budget, procurement and open governance reforms. The programmatic approach is warranted by the scope and depth of the government's governance reform program, the implementation of which will require time, assistance, and flexibility. This operation is complemented by the transition fund project supporting the implementation of Morocco's new governance framework. This US$ 4 million grant provides technical assistance for the implementation of structural reforms fostering public engagement; performance based budgeting and fiscal decentralization. The series adopts a holistic and integrated approach to enhance its impact. It is supporting governance reforms across the public sector covering the central government; State owned Enterprises, or SoEs and agencies, local governments as well as inter-governmental relations. The Bank has provided policy advice and technical assistance for the design of most policy measures and laws supported by this DPL, with the support from the MNA multi-donor trust fund. The transition fund governance project will support the implementation of these structural reforms. While building on the long-standing engagement with public administration reform, under the Public Administration Reform Loan (PARL) series, this program supports the concretization of the performance budgeting reform through the adoption and implementation of the new organic budget law and procurement decree. This DPL series also delves into new reform areas derived from the constitution such as access to information, public petitions, as well as into the governance of SoEs and local finances." }, "project_name" : "MA Accountability and Transparency DPL", "projectdocs" : [ { "DocTypeDesc" : "Program Document (PGD), Vol.1 of 1", "DocType" : "PGD", "EntityID" : "000333037_20131009170139", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131009170139", "DocDate" : "30-SEP-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000231615_20121031105539", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000231615_20121031105539", "DocDate" : "04-SEP-2012" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000386194_20121016015521", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000386194_20121016015521", "DocDate" : "04-SEP-2012" } ], "projectfinancialtype" : "IBRD", "projectstatusdisplay" : "Active", "regionname" : "Middle East and North Africa", "sector" : [ { "Name" : "General public administration sector" }, { "Name" : "Central government administration" }, { "Name" : "Public administration- Information and communications" } ], "sector1" : { "Name" : "General public administration sector", "Percent" : 34 }, "sector2" : { "Name" : "Central government administration", "Percent" : 33 }, "sector3" : { "Name" : "Public administration- Information and communications", "Percent" : 33 }, "sector_namecode" : [ { "name" : "General public administration sector", "code" : "BZ" }, { "name" : "Central government administration", "code" : "BC" }, { "name" : "Public administration- Information and communications", "code" : "BM" } ], "sectorcode" : "BM,BC,BZ", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Other accountability/anti-corruption", "Percent" : 33 }, "theme_namecode" : [ { "name" : "Other accountability/anti-corruption", "code" : "29" }, { "name" : "Other public sector governance", "code" : "30" }, { "name" : "Public expenditure, financial management and procurement", "code" : "27" } ], "themecode" : "27,30,29", "totalamt" : 200000000, "totalcommamt" : 200000000, "url" : "http://www.worldbank.org/projects/P130903?lang=en" } 
{ "_id" : { "$oid" : "52b213b38594d8a2be17c78a" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "borrower" : "GOVERNMENT OF SOUTH SUDAN", "country_namecode" : "Republic of South Sudan!$!SS", "countrycode" : "SS", "countryname" : "Republic of South Sudan", "countryshortname" : "South Sudan", "docty" : "Project Paper,Project Information Document", "envassesmentcategorycode" : "B", "grantamt" : 7530000, "ibrdcommamt" : 0, "id" : "P145339", "idacommamt" : 0, "impagency" : "MINISTRY OF AGRICULTURE, COOPERATIVES AND RURAL DEVELOPMENT", "lendinginstr" : "Specific Investment Loan", "lendinginstrtype" : "IN", "lendprojectcost" : 7530000, "majorsector_percent" : [ { "Name" : "Agriculture, fishing, and forestry", "Percent" : 50 }, { "Name" : "Health and other social services", "Percent" : 30 }, { "Name" : "Agriculture, fishing, and forestry", "Percent" : 20 } ], "mjsector_namecode" : [ { "name" : "Agriculture, fishing, and forestry", "code" : "AX" }, { "name" : "Health and other social services", "code" : "JX" }, { "name" : "Agriculture, fishing, and forestry", "code" : "AX" } ], "mjtheme" : [ "Rural development" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "", "code" : "2" } ], "mjthemecode" : "10,2", "prodline" : "RE", "prodlinetext" : "Recipient Executed Activities", "productlinetype" : "L", "project_abstract" : { "cdata" : "The development objective of the Additional Financing (AF) for the Emergency Food Crisis Response Project for South Sudan is to support adoption of improved technologies for food production by eligible beneficiaries, increase storage capacity for staples, and provide cash or food to eligible people participating in public works programs in selected counties in South Sudan. This is the third AF to the project and will be primarily used to scale-up and augment benefits to already participating beneficiaries and to expand project activities to four additional counties where recent monitoring points to significantly deteriorating food security. The AF will cover the costs associated with: (i) provision of agricultural inputs, production technology, and advisory services; (ii) rehabilitating a seed processing facility to increase farmer's access to improved seed; (iii) bringing land that is currently out of production back into production; (iv) training farmers on reduction of postharvest losses; (v) building of food storage capacity to support postharvest handling at the household and community levels; and (vi) provision of cash or food for work to eligible individuals. The implementation schedule will be slightly revised and the closing date of both the original project and AF will be extended to April 30, 2015." }, "project_name" : "Southern Sudan Emergency Food Crisis Response Project- AF III", "projectdocs" : [ { "DocTypeDesc" : "Project Paper (PJPR), Vol.1 of 1", "DocType" : "PJPR", "EntityID" : "000442464_20131009102446", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131009102446", "DocDate" : "01-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000001843_20130618091419", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000001843_20130618091419", "DocDate" : "07-JUN-2013" } ], "projectfinancialtype" : "OTHER", "projectstatusdisplay" : "Active", "regionname" : "Africa", "sector" : [ { "Name" : "Crops" }, { "Name" : "Other social services" }, { "Name" : "General agriculture, fishing and forestry sector" } ], "sector1" : { "Name" : "Crops", "Percent" : 50 }, "sector2" : { "Name" : "Other social services", "Percent" : 30 }, "sector3" : { "Name" : "General agriculture, fishing and forestry sector", "Percent" : 20 }, "sector_namecode" : [ { "name" : "Crops", "code" : "AH" }, { "name" : "Other social services", "code" : "JB" }, { "name" : "General agriculture, fishing and forestry sector", "code" : "AZ" } ], "sectorcode" : "AZ,JB,AH", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "Y", "theme1" : { "Name" : "Global food crisis response", "Percent" : 100 }, "theme_namecode" : [ { "name" : "Global food crisis response", "code" : "91" } ], "themecode" : "91", "totalamt" : 0, "totalcommamt" : 7530000, "url" : "http://www.worldbank.org/projects/P145339?lang=en" } 
{ "_id" : { "$oid" : "52b213b38594d8a2be17c78b" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "closingdate" : "2017-12-31T00:00:00Z", "country_namecode" : "Republic of India!$!IN", "countrycode" : "IN", "countryname" : "Republic of India", "countryshortname" : "India", "docty" : "Project Appraisal Document,Environmental Assessment,Project Information Document,Integrated Safeguards Data Sheet,Working Paper", "envassesmentcategorycode" : "B", "grantamt" : 0, "ibrdcommamt" : 0, "id" : "P146653", "idacommamt" : 250000000, "lendinginstr" : "Investment Project Financing", "lendinginstrtype" : "IN", "lendprojectcost" : 250000000, "majorsector_percent" : [ { "Name" : "Transportation", "Percent" : 60 }, { "Name" : "Water, sanitation and flood protection", "Percent" : 25 }, { "Name" : "Industry and trade", "Percent" : 10 }, { "Name" : "Health and other social services", "Percent" : 5 } ], "mjsector_namecode" : [ { "name" : "Transportation", "code" : "TX" }, { "name" : "Water, sanitation and flood protection", "code" : "WX" }, { "name" : "Industry and trade", "code" : "YX" }, { "name" : "Health and other social services", "code" : "JX" } ], "mjtheme" : [ "Rural development", "Social protection and risk management", "Social protection and risk management", "Environment and natural resources management" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Environment and natural resources management", "code" : "11" } ], "mjthemecode" : "10,6,6,11", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of the Uttarakhand Disaster Recovery Project for India is to restore housing, rural connectivity and build resilience of communities in Uttarakhand and increase the technical capacity of the state entities to respond promptly and effectively to an eligible crisis or emergency. There are six components to the project, the first component being resilient infrastructure reconstruction. The objective of this component is to focus on the immediate needs of reconstruction of damaged houses and public buildings. The aim is to reduce the vulnerability of the affected population and restore access to the basic services of governance. The second component is the rural road connectivity. The objective of this component is to restore the connectivity lost due to the disaster through the reconstruction of damaged roads and bridges including: village roads, Other District Roads (ODRs), bridle roads and bridle bridges. The third component is the technical assistance and capacity building for disaster risk management. The objective of this component is to enhance the capabilities of government entities and others in risk mitigation and response. The fourth component is the financing disaster response expenses. This component will support the financing of eligible expenses already incurred by the state during the immediate post-disaster response period. The fifth component is the implementation support. This component will support the incremental operating costs of the project, including the operation of the Project Management Unit (PMU) and the respective Project Implementation Units (PIUs). Finally, the sixth component is the contingency emergency response." }, "project_name" : "Uttarakhand Disaster Recovery Project", "projectdocs" : [ { "DocTypeDesc" : "Project Appraisal Document (PAD), Vol.1 of 1", "DocType" : "PAD", "EntityID" : "000333037_20131021112627", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131021112627", "DocDate" : "11-OCT-2013" }, { "DocTypeDesc" : "Environmental Assessment (EA), Vol.1 of 1", "DocType" : "EA", "EntityID" : "000442464_20131015112514", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131015112514", "DocDate" : "10-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000356161_20130926131319", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000356161_20130926131319", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Integrated Safeguards Data Sheet (ISDS), Vol.1 of 1", "DocType" : "ISDS", "EntityID" : "000333037_20130926120720", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20130926120720", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Working Paper (WP), Vol.1 of 1", "DocType" : "WP", "EntityID" : "000333037_20131115110208", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131115110208", "DocDate" : "01-JUN-2013" } ], "projectfinancialtype" : "IDA", "projectstatusdisplay" : "Active", "regionname" : "South Asia", "sector" : [ { "Name" : "Rural and Inter-Urban Roads and Highways" }, { "Name" : "Flood protection" }, { "Name" : "Housing construction" }, { "Name" : "Other social services" } ], "sector1" : { "Name" : "Rural and Inter-Urban Roads and Highways", "Percent" : 60 }, "sector2" : { "Name" : "Flood protection", "Percent" : 25 }, "sector3" : { "Name" : "Housing construction", "Percent" : 10 }, "sector4" : { "Name" : "Other social services", "Percent" : 5 }, "sector_namecode" : [ { "name" : "Rural and Inter-Urban Roads and Highways", "code" : "TI" }, { "name" : "Flood protection", "code" : "WD" }, { "name" : "Housing construction", "code" : "YC" }, { "name" : "Other social services", "code" : "JB" } ], "sectorcode" : "JB,YC,WD,TI", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Rural services and infrastructure", "Percent" : 60 }, "theme_namecode" : [ { "name" : "Rural services and infrastructure", "code" : "78" }, { "name" : "Natural disaster management", "code" : "52" }, { "name" : "Social risk mitigation", "code" : "87" }, { "name" : "Climate change", "code" : "81" } ], "themecode" : "81,87,52,78", "totalamt" : 250000000, "totalcommamt" : 250000000, "url" : "http://www.worldbank.org/projects/P146653?lang=en" } 

值得一提的是,每一行中不是所有的领域都有有效的信息,我应该是找出并纠正这个问题。我不想要这个答案,我只是想知道如何使用json_normalize来获取熊猫数据框的信息。

+1

你可以添加你的'data.json' – Dark

+0

如果您需要任何更多的信息,让我知道一个片段。我仍然没有得到这个平方。 –

回答

0

这为我工作:

  1. 逐行读取数据线作为字符串(复制粘贴文本到一个文件)

  2. 使用JSON函数中的每个字符串转换为Python字典。

  3. 使用pandas json_normalize将每个字典转换为一行DF,如果需要,连接所有DF。

    import pandas as pd 
    from pandas.io.json import json_normalize 
    import json 
    
    with open('data.json', 'r') as f: # 'data.json' is the name of the file 
        data = f.readlines() 
    
    pd.concat([json_normalize(json.loads(j)) for j in data])