2016-11-12 125 views
0

我正在从Twitter API中获取数据。将数据从JSON对象转换为数据框并加载到数据仓库。查找下面的输入和代码片段。将嵌套的JSON对象转换为R中的数据帧

我对R编程非常陌生。

stats_campaign.data <- content(stats_campaign.request) 
print(stats_campaign.data) 

O/P:

`{ 
"data_type": [ "stats" ], 
"time_series_length": [ 1 ], 
"data": [ 
{ 
    "id": [ "XXXXX" ], 
    "id_data": [ 
    { 
     "segment": {}, 
     "metrics": { 
     "impressions": {}, 
     "tweets_send": {}, 
     "qualified_impressions": {}, 
     "follows": {}, 
     "app_clicks": {}, 
     "retweets": {}, 
     "likes": {}, 
     "engagements": {}, 
     "clicks": {}, 
     "card_engagements": {}, 
     "replies": {}, 
     "url_clicks": {}, 
     "carousel_swipes": {} 
     } 
    } 
    ] 
    }, 

    {  
    "id": [ "XXXX1" ], 
    "id_data": [ 
    { 
     "segment": {}, 
     "metrics": { 
     "impressions": {}, 
     "tweets_send": {}, 
     "qualified_impressions": {}, 
     "follows": {}, 
     "app_clicks": {}, 
     "retweets": {}, 
     "likes": {}, 
     "engagements": {}, 
     "clicks": {}, 
     "card_engagements": {}, 
     "replies": {}, 
     "url_clicks": {}, 
     "carousel_swipes": {} 
     } 
    } 
    ] 
    },` 

当我读这个JSON值,

stats_json_file <- sprintf("P:/R Repos/R  
       Applications/TwitterAPIData/stats_test_data-%s.json", TODAY) 
    jsonlite::fromJSON(stats_json_file) 

    **Result :** 
     id          id_data 
    1 5wcaz           NULL 
    2 5ub2u           NULL 
    3 5wb8x           NULL 
    4 5wb1j           NULL 
    5 5yqwj           NULL 
    6 5pq5i           NULL 
    7 5u197           NULL 
    8 5z2js           NULL 
    9 6fqh0 333250, 4, 9, 19, 111, 3189, 3156, 5, 1091 
    10 5tvr1           NULL 
    11 5yqw4           NULL 
    12 5qqps           NULL 
    13 5yqvw           NULL 
    14 5ygom           NULL 
    15 5nc88           NULL 
    16 5yg94           NULL 
    17 65t9e           NULL 
    18 5peck           NULL 
    19 63pg1 247283, 17, 22, 35, 297, 5514, 5450, 6, 2971 
    20 6cdvy  156705, 1, 2, 6, 112, 10933, 605, 170 

    From my JSON file I want Id and whole "metrics": { 
     "impressions": {}, 
     "tweets_send": {}, 
     "qualified_impressions": {}, 
     "follows": {}, 
     "app_clicks": {}, 
     "retweets": {}, 
     "likes": {}, 
     "engagements": {}, 
     "clicks": {}, 
     "card_engagements": {}, 
     "replies": {}, 
     "url_clicks": {}, 
     "carousel_swipes": {} 
     } 
     and convert to Data Frame to load into Data Base. Plzz Help..! 

我如何解析这个JSON对象。我想检索整个Metrics对象的Id &。然后想要转换成数据框以加载到SQL表中。

读书,我用下面的代码的多个标识的&指标值,

`test <- list() 
for(i in 1:len) 
{ test <- unlist(stats_campaign.data$data[[i]]) 
print(test)}` 

**Output:** 
     id 
    "5wcaz" 
     id 
    "5ub2u" 
     id 
    "5wb8x" 
     id 
"5wb1j" 
     id 
"5yqwj" 
     id 
    "5pq5i" 
     id 
    "5u197" 
     id 
    "5z2js" 
     id 
    "5tvr1" 
     id 
    "5yqw4" 
     id 
    "5qqps" 
     id 
    "5yqvw" 
     id 
    "5ygom" 
     id 
    "5nc88" 
     id 
    "5yg94" 
     id 
    "65t9e" 
     id 
    "5peck" 
        id id_data.metrics.impressions 
        "63pg1"     "133227" 
         id_data.metrics.tweets_send  id_data.metrics.follows 
        "10"       "9" 
         id_data.metrics.retweets  id_data.metrics.likes 
        "17"      "96" 
        id_data.metrics.engagements  id_data.metrics.clicks 
       "2165"      "2134" 
        id_data.metrics.replies id_data.metrics.url_clicks 
        "5"      "1204" 
        id id_data.metrics.impressions 
       "6cdvy"     "176164" 
    id_data.metrics.tweets_send id_data.metrics.retweets 
        "2"      "10" 
    id_data.metrics.likes id_data.metrics.engagements 
        "121"      "9708" 
    id_data.metrics.clicks id_data.metrics.url_clicks 
        "620"      "160" 

在一个为我所用列表或别的东西每次追加的价值,我怎么能做到这一点..? ?我正在使用正确的方法吗?有没有其他方法可以解析嵌套的JSON对象,并直接放入数据框..?

请帮助..!提前致谢..!

+2

如果您的JSON在语法上有效,那么在R中,您可以执行'jsonlite :: fromJSON(your_text)'。不过,你的括号似乎有一些问题。 – Gregor

+0

这是我的JOSN FOrmat, –

+0

好的,你的JSON现在是有效的。你可以在其上运行'jsonlite :: fromJSON(your_text)'并获得有用的结果。你想要什么?而不是显示你*不需要的输出,你能显示你想要的输出吗? – Gregor

回答

0

正如在评论中提到的,关于的更多信息,你在寻找什么输出会有所帮助。无论如何,我希望以下内容能够提供有益的指导。 tidyjson README提供了一些有用的概述。

不幸的是,由于缺少JSON对象中的数据,很难说明数据中可能存在什么(空对象中会出现什么内容),并且我很难确定您所使用的Twitter API的哪一部分看着。 tidyjson即使在没有数据的情况下也能够生成一致的data.frame输出!关键动词是gatherspread,很像tidyr,但具有JSON风味。

str <- "{\"data_type\":[\"stats\"],\"time_series_length\":[1],\"data\":[{\"id\":[\"XXXXX\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]},{\"id\":[\"XXXX1\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]}]} " 

library(dplyr) 
library(tidyjson) 

prep <- as.tbl_json(str) %>% enter_object("data") %>% gather_array("objid") 

p1 <- prep %>% enter_object("id") %>% 
    gather_array("idnum") %>% append_values_string("id") 

p2 <- prep %>% enter_object("id_data") %>% gather_array("datanum") %>% 
enter_object("metrics") %>% 
spread_values(
impressions = jstring("impressions", "value") 
, tweets_send = jnumber("tweets_send", "somekey") 
) 

p1 %>% tbl_df() %>% left_join(p2 %>% tbl_df(), by = c("document.id", "objid")) 
#> # A tibble: 2 x 7 
#> document.id objid idnum id datanum impressions tweets_send 
#>   <int> <int> <int> <chr> <int>  <chr>  <dbl> 
#> 1   1  1  1 XXXXX  1  <NA>   NA 
#> 2   1  2  1 XXXX1  1  <NA>   NA 
相关问题