2013-06-05 44 views
28

我有一个文件,包含超过1500个Json对象,我想在R中使用。我已经能够将数据导入为列表,但是难以将它强制转换为有用的结构。我想创建一个数据框,其中包含每个json对象的行和每个键:值对的列。将导入的json数据导入数据框

我重新创建我的情况与此小,假数据集:

[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, 
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, 
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, 
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, 
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, 
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, 
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}] 

数据的一些特点:

  • 都含有相同数量的关键的对象:值对,虽然 部分值为空
  • 每个对象(名称和组)有两个非数字列
  • name是唯一标识ier,有10个左右的组
  • 许多名称和组名包含空格,逗号和其他标点符号。

基于这样一个问题:R list(structure(list())) to data frame,我试过如下:

json_file <- "test.json" 
json_data <- fromJSON(json_file) 
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame)) 

随着我的两个实际数据和这个假数据,最后一行给我这个错误:

Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24, : 
    arguments imply differing number of rows: 1, 0 

回答

38

您只需要用NAs替换NULL:

require(RJSONIO)  

json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, 
    {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, 
    {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, 
    {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, 
    {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, 
    {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, 
    {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]' 


json_file <- fromJSON(json_file) 

json_file <- lapply(json_file, function(x) { 
    x[sapply(x, is.null)] <- NA 
    unlist(x) 
}) 

一旦你为每个元件的非空值,你可以调用rbind没有得到一个错误:

do.call("rbind", json_file) 
    name   group age (y) height (cm) wieght (kg) score 
[1,] "Doe, John" "Red" "24" "182"  "74.8"  NA 
[2,] "Doe, Jane" "Green" "30" "170"  "70.1"  "500" 
[3,] "Smith, Joan" "Yellow" "41" "169"  "60"  NA 
[4,] "Brown, Sam" "Green" "22" "183"  "75"  "865" 
[5,] "Jones, Larry" "Green" "31" "178"  "83.9"  "221" 
[6,] "Murray, Seth" "Red" "35" "172"  "76.2"  "413" 
[7,] "Doe, Jane" "Yellow" "22" "164"  "68"  "902" 
+3

我很惊讶,有没有更好的功能来做到这一点。 (对于XML,有XMLtoDataFrame之类的函数),所以JSONtoDataFrame会很棒 – userJT

+1

@userJT - 有'jsonlite :: fromJSON' - 处理NULL并简化为'data.frame'。请参阅[我的答案](http://stackoverflow.com/a/37739735/5977215) – SymbolixAU

+0

这将json_file转换为矩阵,而不是数据框。我如何获得data.frame? – TSR

-2

要删除空值,使用参数NullValue属性

json_data <- fromJSON(json_file, nullValue = NA) 
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame)) 

这种方式,不需额外有在输出

2
dplyr::bind_rows(fromJSON(file_name)) 
+0

哪个'fromJson'函数是你用的?如果它来自'jsonlite',那么'dplyr :: bind_rows'是多余的。如果它来自'rjson',那么你提供的数据就是你的solutino错误。 – SymbolixAU

+0

不记得;事情一定已经改变了 –

15

任何不必要的报价,如果你使用library(jsonlite)和功能fromJSON这是非常简单的。它还处理null值并将它们转换为NA

json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, 
    {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, 
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, 
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, 
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, 
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, 
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]' 

library(jsonlite) 
fromJSON(json_file) 
#   name group age (y) height (cm) wieght (kg) score 
# 1 Doe, John Red  24   182  74.8 NA 
# 2 Doe, Jane Green  30   170  70.1 500 
# 3 Smith, Joan Yellow  41   169  60.0 NA 
# 4 Brown, Sam Green  22   183  75.0 865 
# 5 Jones, Larry Green  31   178  83.9 221 
# 6 Murray, Seth Red  35   172  76.2 413 
# 7 Doe, Jane Yellow  22   164  68.0 902 

str(fromJSON(json_file)) 
# 'data.frame': 7 obs. of 6 variables: 
# $ name  : chr "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ... 
# $ group  : chr "Red" "Green" "Yellow" "Green" ... 
# $ age (y) : int 24 30 41 22 31 35 22 
# $ height (cm): int 182 170 169 183 178 172 164 
# $ wieght (kg): num 74.8 70.1 60 75 83.9 76.2 68 
# $ score  : int NA 500 NA 865 221 413 902 
+0

我运行了和你一样的代码,但是当我运行'fromJSON'时,它返回一个列表,而不是数据框。你是如何得到它返回一个数据框? – Alexander

+0

@Alexander - 我仍然得到一个'data.frame'。确保你使用'jsonlite :: fromJSON' – SymbolixAU

3
library(rjson) 
Lines <- readLines("yelp_academic_dataset_business.json") 
business <- as.data.frame(t(sapply(Lines, fromJSON))) 

你可以试试这JSON数据加载成R