2016-10-14 115 views
0

如果R不适合这份工作,那么这个工作够公平,但我相信它应该是。R:阅读并解析Json

我打电话给一个API,然后将结果转储到Postman json reader中。然后,我得到这样的结果:

"results": [ 
    { 
     "personUuid": "***", 
     "synopsis": { 
     "fullName": "***", 
     "headline": "***", 
     "location": "***", 
     "image": "***", 
     "skills": [ 
      "*", 
      "*", 
      "*", 
      "*.", 
      "*" 
     ], 
     "phoneNumbers": [ 
      "***", 
      "***" 
     ], 
     "emailAddresses": [ 
      "***" 
     ], 
     "networks": [ 
      { 
      "name": "linkedin", 
      "url": "***", 
      "type": "canonicalUrl", 
      "lastAccessed": null 
      }, 
      { 
      "name": "***", 
      "url": "***", 
      "type": "cvUrl", 
      "lastAccessed": "*" 
      }, 
      { 
      "name": "*", 
      "url": "***", 
      "type": "cvUrl", 
      "lastAccessed": "*" 
      } 
     ] 
     } 
    }, 
    { 

首先我不知道如何导入成R这是我主要处理CSV的。我见过其他问题,人们使用Json包直接调用URL,但这不会与我在做什么一起工作,所以我想知道如何用json读取csv。

我用:

x <- fromJSON(file="Z:/json.csv") 

不过,或许那里有一个更好的办法。一旦做到这一点的JSON看起来更像:

...$results[[9]]$synopsis$emailAddresses 
[1] "***" "***"   
[3] "***"    "***"   

$results[[9]]$synopsis$networks... 

然后我想什么每个结果是存储标题,然后电子邮件地址的数据表。

我想:

str_extract_all(x, 'emailAddresses*$') 

但是我想通*将代表emailAddresses和包括新线等在$之间的一切,然而,这是行不通的。当你得到*工作时,我也会找到解压缩文件,它不会提取*代表什么。

如:

> y <- 'some text. email "[email protected]" other text' 
> y 
[1] "some text. email \"[email protected]\" other text" 
> str_extract_all(y, 'email \"*"') 
[[1]] 
[1] "email \"" 

第2部分:低于

答案的工作,但如果我叫直接通过API:使用

fromJSON(y, flatten=TRUE)$results[c("synopsis.headline", 
              "synopsis.emailAddresses")] 

body ='{"start": 0,"count": 105,...}' 

x <- POST(url="https://live.*.me/api/v3/person", body=body, add_headers(Accept="application/json", 'Content-Type'="application/json", Authorization = "id=*, apiKey=*")) 

y <- content(x) 

然后不行。我试过如下:

z <- NULL 
zz <- NULL 

for(i in 1:y$count){ 
    z=rbind(z,data.table(job = y$results[[i]]$synopsis$headline)) 
} 
for(i in 1:y$count){ 
     zz=rbind(zz,data.table(job = y$results[[i]]$synopsis$emailAddresses)) 
    } 
df <- cbind(z,zz) 

但是返回的JSON列表时,有些人有多个电子邮件地址。因此,上面的方法只记录每个人的第一封电子邮件,我将如何将多个电子邮件保存为矢量(而不是多列)?

+0

看看下面的软件包'rjson','rjson2','feather'。祝你好运! –

+0

需要jsonlite flatten = T –

+0

第二部分我认为我可以使用httr,只是不知道如何添加身份在上面的格式和相同的身份验证,因为我有一个id以及api键 –

回答

2

更新1: 来读取,你可以简单地使用fromJSON功能的URL的JSON,传递字符串与JSON数据网址:

library(jsonlite) 

url <- 'http://you.url.com/data.json' 

# in this case we pass an URL to the fromJSON function instead of the actual content we want to parse 
fromJSON(url, flatten=TRUE)$results[c("synopsis.headline", "synopsis.emailAddresses")] 

// end UPDATE 1 

你也可以将扁平化 param传递给fromJSON,然后使用'results'数据框。

fromJSON(json.data, flatten=TRUE)$results[c("synopsis.headline", 
              "synopsis.emailAddresses")] 

synopsis.headline synopsis.emailAddresses 
1    ***  [email protected] 
2    ***  [email protected] 

这里是我如何定义json。数据,请注意我有意在示例输入json中添加了1条记录。

json.data <- '{ 
     "results":[ 
     { 
      "personUuid":"***", 
      "synopsis":{ 
      "fullName":"***", 
      "headline":"***", 
      "location":"***", 
      "image":"***", 
      "skills":[ 
       "*", 
       "*", 
       "*", 
       "*.", 
       "*" 
       ], 
      "phoneNumbers":[ 
       "***", 
       "***" 
       ], 
      "emailAddresses":[ 
       "[email protected]" 
       ], 
      "networks":[ 
       { 
       "name":"linkedin", 
       "url":"***", 
       "type":"canonicalUrl", 
       "lastAccessed":null 
       }, 
       { 
       "name":"***", 
       "url":"***", 
       "type":"cvUrl", 
       "lastAccessed":"*" 
       }, 
       { 
       "name":"*", 
       "url":"***", 
       "type":"cvUrl", 
       "lastAccessed":"*" 
       } 
       ] 
      } 
     }, 
     { 
      "personUuid":"***", 
      "synopsis":{ 
      "fullName":"***", 
      "headline":"***", 
      "location":"***", 
      "image":"***", 
      "skills":[ 
       "*", 
       "*", 
       "*", 
       "*.", 
       "*" 
       ], 
      "phoneNumbers":[ 
       "***", 
       "***" 
       ], 
      "emailAddresses":[ 
       "[email protected]" 
       ], 
      "networks":[ 
       { 
       "name":"linkedin", 
       "url":"***", 
       "type":"canonicalUrl", 
       "lastAccessed":null 
       }, 
       { 
       "name":"***", 
       "url":"***", 
       "type":"cvUrl", 
       "lastAccessed":"*" 
       }, 
       { 
       "name":"*", 
       "url":"***", 
       "type":"cvUrl", 
       "lastAccessed":"*" 
       } 
       ] 
      } 
     } 
     ] 
    }' 
+0

完美谢谢,我有关于从R调用API的最后部分? –

+0

我编辑了我的答案,请让我知道,如果这是你需要的,如果它的工作,谢谢! –

+0

不知道是否有帮助,我的网址是'https://live.*.me/api/v3/person',但正如您在使用curl时可以看到的,我正在定义设置参数并使用安全凭证访问api –

1

其他测试数据可能会有帮助。

考虑:

library(jsonlite) 
library(dplyr) 

json_data = "{\"results\": [\n {\n\"personUuid\": \"***\",\n\"synopsis\": {\n\"fullName\": \"***\",\n\"headline\": \"***\",\n\"location\": \"***\",\n\"image\": \"***\",\n\"skills\": [\n\"*\",\n\"*\",\n\"*\",\n\"*.\",\n\"*\"\n],\n\"phoneNumbers\": [\n\"***\",\n\"***\"\n],\n\"emailAddresses\": [\n\"***\"\n],\n\"networks\": [\n{\n \"name\": \"linkedin\",\n \"url\": \"***\",\n \"type\": \"canonicalUrl\",\n \"lastAccessed\": null\n},\n {\n \"name\": \"***\",\n \"url\": \"***\",\n \"type\": \"cvUrl\",\n \"lastAccessed\": \"*\"\n },\n {\n \"name\": \"*\",\n \"url\": \"***\",\n \"type\": \"cvUrl\",\n \"lastAccessed\": \"*\"\n }\n ]\n}\n}]}" 

(df <- jsonlite::fromJSON(json_data, simplifyDataFrame = TRUE, flatten = TRUE)) 
#> $results 
#> personUuid synopsis.fullName synopsis.headline synopsis.location 
#> 1  ***    ***    ***    *** 
#> synopsis.image synopsis.skills synopsis.phoneNumbers 
#> 1   *** *, *, *, *., *    ***, *** 
#> synopsis.emailAddresses 
#> 1      *** 
#>              synopsis.networks 
#> 1 linkedin, ***, *, ***, ***, ***, canonicalUrl, cvUrl, cvUrl, NA, *, * 

df$results %>% 
    select(headline = synopsis.headline, emails = synopsis.emailAddresses) 
#> headline emails 
#> 1  *** ***