如何打开.mongo文件并将内容导出到csv？

编辑2014-05-01：我首先尝试从JSON（如下面的建议），但只解析了第一行。我发现每个JSON行的括号之间都有逗号，所以我在TextEdit中将其更改并保存了该文件。我还在文件的开头添加了[]，然后添加了JSON。下一步：从列表（嵌入列表）到数据框（或csv）。如何打开.mongo文件并将内容导出到csv？

我现在每隔一段时间就会从我们正在评估的课程中得到edX的数据包。其中一些只是简单的.csv文件，这些文件很容易处理，其他文件对我来说更加困难（没有CS或编程背景）。

我有2个文件我想打开并解析成csv文件在R中进行分析。我尝试了很多很多json2csv工具，但无济于事。我也尝试了这里描述的简单方法来将json转换为csv。

数据是保密的，所以我不能共享整个数据集，但会共享文件的前两行，这可能有帮助。问题是我找不到任何关于.mongo文件的东西，对我来说这似乎很奇怪，它们甚至存在吗？或者这只是一个可能被破坏的JSON文件（这可以解释错误）？

欢迎任何建议。

第一2条线路中的.mongo文件之一：

{ 
    "_id": { 
     "$oid": "52d1e62c350e7a3156000009" 
    }, 
    "votes": { 
     "up": [ 

     ], 
     "down": [ 

     ], 
     "up_count": 0, 
     "down_count": 0, 
     "count": 0, 
     "point": 0 
    }, 
    "visible": true, 
    "abuse_flaggers": [ 

    ], 
    "historical_abuse_flaggers": [ 

    ], 
    "parent_ids": [ 

    ], 
    "at_position_list": [ 

    ], 
    "body": "the delft university accredited course with the scholarship (fundamentals of water treatment) is supposed to start in about a month's time. But have the scholarship list been published? Any tentative date??", 
    "course_id": "DelftX/CTB3365x/2013_Fall", 
    "_type": "Comment", 
    "endorsed": false, 
    "anonymous": false, 
    "anonymous_to_peers": false, 
    "author_id": "269835", 
    "comment_thread_id": { 
     "$oid": "52cd40c5ab40cf347e00008d" 
    }, 
    "author_username": "tachak59", 
    "sk": "52d1e62c350e7a3156000009", 
    "updated_at": { 
     "$date": 1389487660636 
    }, 
    "created_at": { 
     "$date": 1389487660636 
    } 
}{ 
    "_id": { 
     "$oid": "52d0a66bcb3eee318d000012" 
    }, 
    "votes": { 
     "up": [ 

     ], 
     "down": [ 

     ], 
     "up_count": 0, 
     "down_count": 0, 
     "count": 0, 
     "point": 0 
    }, 
    "visible": true, 
    "abuse_flaggers": [ 

    ], 
    "historical_abuse_flaggers": [ 

    ], 
    "parent_ids": [ 
     { 
      "$oid": "52c63278100c07c0d1000028" 
     } 
    ], 
    "at_position_list": [ 

    ], 
    "body": "I got it. Thank you!", 
    "course_id": "DelftX/CTB3365x/2013_Fall", 
    "_type": "Comment", 
    "endorsed": false, 
    "anonymous": false, 
    "anonymous_to_peers": false, 
    "parent_id": { 
     "$oid": "52c63278100c07c0d1000028" 
    }, 
    "author_id": "2655027", 
    "comment_thread_id": { 
     "$oid": "52c4f303b03c4aba51000013" 
    }, 
    "author_username": "dmoronta", 
    "sk": "52c63278100c07c0d1000028-52d0a66bcb3eee318d000012", 
    "updated_at": { 
     "$date": 1389405803386 
    }, 
    "created_at": { 
     "$date": 1389405803386 
    } 
}{ 
    "_id": { 
     "$oid": "52ceea0cada002b72c000059" 
    }, 
    "votes": { 
     "up": [ 

     ], 
     "down": [ 

     ], 
     "up_count": 0, 
     "down_count": 0, 
     "count": 0, 
     "point": 0 
    }, 
    "visible": true, 
    "abuse_flaggers": [ 

    ], 
    "historical_abuse_flaggers": [ 

    ], 
    "parent_ids": [ 
     { 
      "$oid": "5287e8d5906c42f5aa000013" 
     } 
    ], 
    "at_position_list": [ 

    ], 
    "body": "if u please send by mail \n", 
    "course_id": "DelftX/CTB3365x/2013_Fall", 
    "_type": "Comment", 
    "endorsed": false, 
    "anonymous": false, 
    "anonymous_to_peers": false, 
    "parent_id": { 
     "$oid": "5287e8d5906c42f5aa000013" 
    }, 
    "author_id": "2276302", 
    "comment_thread_id": { 
     "$oid": "528674d784179607d0000011" 
    }, 
    "author_username": "totah1993", 
    "sk": "5287e8d5906c42f5aa000013-52ceea0cada002b72c000059", 
    "updated_at": { 
     "$date": 1389292044203 
    }, 
    "created_at": { 
     "$date": 1389292044203 
    } 
}

来源

2014-04-30 Thieme Hennis

似乎源是MongoDB的。 MongoDB允许导出为CSV或有效的JSON数组（使用mongoexport的'--jsonArray'标志）。也许你的对手可以使用这些选项？ – Sebastian

是的......他们可能，但沟通很慢。我怀疑他们会例外，但我可以问任何问题。感谢您的建议，我一定会尝试。 –

R没有为这些文件“本地”的支持，但没有与rjson封装的JSON解析器。所以我可能会加载我.mongo文件有：

myfile <- "path/to/myfile.mongo" 
myJSON <- readLines(myfile) 
myNiceData <- fromJSON(myJSON)

由于RJson转换成适合对象读取的数据结构，你就必须做一些额外的窥探但一旦你有一个R数据类型，你不该”从那里使用它没有任何麻烦。

解析JSON数据时需要考虑的另一个软件包是jsonlite。它将为您创建数据框，以便您可以使用write.table或其他适用于编写对象的方法将它们编写为csv格式。

注意：如果它更容易连接到MongoDB并从请求中获取数据，那么RMongo可能是一个不错的选择。 R-Bloggers也制作了post关于使用RMongo，它有一个很好的小演练。

来源

2014-04-30 23:12:00 theWanderer4865

谢谢。事实上，我首先尝试了JSON，但只解析了第一行。我发现每个JSON行的括号之间都有逗号，所以我在TextEdit中将其更改并保存了该文件。我还添加了[在文件的开始处]，然后它与JSON一起工作。现在我还有一个问题，那就是列表中有列表，我必须弄清楚如何正确解析。 –

请参阅下面的解决方案以了解整个工作流程和解决方案。 –

尝试使用jsonlite，它会给你一个数据框。 – theWanderer4865

我按照@theWanderer的建议使用RJSON，并在同事的帮助下编写了以下代码，将数据解析为列，选择所需的特定列，并检查每个实例是否返回正确的变量。

整个工作流程：

经过一些在jsonlint数据 - 校正的错误→}，{代替} {该文件的每一行，并在开始[和]和结束之间
制作一个更小的文件来播放，包含大约11条JSON行
使用下面的代码来解析数据文件 - 但是，如果它们本身不是列表（如果出现问题），则首先检查不同的listItem //如您将看到的，我还删除了像\ n这样的东西，因为那样会给出错误，并且如果数据中没有任何东西，则为parent_id添加一个空值（o therwise它将混合起来的数据）

导入.mongo文件成R，然后解析它的代码转换成CSV：

library(rjson) 

###### set working directory to write out the data file 
setwd("/your/favourite/dir/json to csv/") 

#never ever convert strings to factors 
options(stringsAsFactors = FALSE) 
#import the .mongo file to R 
temp.data = fromJSON(file="temp.mongo", method="C", unexpected.escape="error") 

file.remove("temp.csv") ## removes the old datafile if there is one 
         ## (so the data is not appended to the file, 
         ## but a new file is created) 

listItem = temp.data[[1]] ## prepare the listItem the first time 

for (listItem in temp.data){ 
    parent_id = "" 
    if (length(listItem$parent_id)>0){ 
    parent_id = listItem$parent_id 
    } 
write.table(t(c(
    listItem$votes$up_count, listItem$visible, parent_id, 
    gsub("\n", "", listItem$body), listItem$course_id, unlist(listItem["_type"]), 
    listItem$endorsed, listItem$anonymous, listItem$author_id, 
    unlist(listItem$comment_thread_id), listItem$author_username, 
    as.POSIXct(unlist(listItem$created_at)/1000, origin="1970-01-01"))), # end t(), c() 
    file="temp.csv", sep="\t", append=TRUE, row.names=FALSE, col.names=FALSE) 
}

来源

2014-05-01 16:16:20

如何打开.mongo文件并将内容导出到csv？

回答

相关问题