使用Pandas读取子水平数据时,我卡住了。使用Pandas读取子级JSON数据
背景:
我用NYT存档API下载一系列数据,我保存它实际上有它JSON对象列表的JSON文件。
步骤:
我使用read_json方法读取的JSON文件。
pandas_df = pd.read_json("data.json")
当我用头看样的结果,它看起来像如下:
pandas_df.head()
copyright \
0 Copyright (c) 2013 The New York Times Company....
1 Copyright (c) 2013 The New York Times Company....
2 Copyright (c) 2013 The New York Times Company....
3 Copyright (c) 2013 The New York Times Company....
4 Copyright (c) 2013 The New York Times Company....
response
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
我只需要在响应信息。所以,当我改变像下面的代码:
print(pandas_df["response"].head())
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
Name: response, dtype: object
问:
我如何可以获取使用内部文档元素的数据?像小节,幻灯片等我可以看到它在表格格式,如数据框?
如果需要更多信息,请让我知道。
谢谢。
EDIT 1:
从JSON文件添加第一个元素。这个文件在1GB左右太大了。
{
"copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.",
"response": {
"meta": {
"hits": 7652
},
"docs": [
{
"web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html",
"snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"height": 346,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"xlargeheight": "346"
},
"type": "image"
},
{
"width": 75,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits",
"kicker": "Tech Fix"
},
"keywords": [
{
"rank": "1",
"is_major": "N",
"name": "subject",
"value": "Video Recordings, Downloads and Streaming"
},
{
"rank": "2",
"is_major": "N",
"name": "subject",
"value": "Television Sets and Media Devices"
},
{
"rank": "1",
"is_major": "Y",
"name": "subject",
"value": "Television"
}
],
"pub_date": "2016-01-01T05:00:00Z",
"document_type": "multimedia",
"news_desk": "Technology/Personal Tech",
"section_name": "Technology",
"subsection_name": "Personal Tech",
"byline": {
"person": [
{
"firstname": "Brian",
"middlename": "X.",
"lastname": "CHEN",
"rank": 1,
"role": "reported",
"organization": ""
}
],
"original": "By BRIAN X. CHEN"
},
"type_of_material": "Interactive Feature",
"_id": "57fdfb9895d0e022439c2b57",
"word_count": null,
"slideshow_credits": null
}]}}
您可以发布前几行的整个原始JSON吗? –
补充,请看看。 –
我想读“文档” –