编辑2014-05-01:我首先尝试从JSON(如下面的建议),但只解析了第一行。我发现每个JSON行的括号之间都有逗号,所以我在TextEdit中将其更改并保存了该文件。我还在文件的开头添加了[],然后添加了JSON。下一步:从列表(嵌入列表)到数据框(或csv)。如何打开.mongo文件并将内容导出到csv?
我现在每隔一段时间就会从我们正在评估的课程中得到edX的数据包。其中一些只是简单的.csv文件,这些文件很容易处理,其他文件对我来说更加困难(没有CS或编程背景)。
我有2个文件我想打开并解析成csv文件在R中进行分析。我尝试了很多很多json2csv工具,但无济于事。我也尝试了这里描述的简单方法来将json转换为csv。
数据是保密的,所以我不能共享整个数据集,但会共享文件的前两行,这可能有帮助。问题是我找不到任何关于.mongo文件的东西,对我来说这似乎很奇怪,它们甚至存在吗?或者这只是一个可能被破坏的JSON文件(这可以解释错误)?
欢迎任何建议。
第一2条线路中的.mongo文件之一:
{
"_id": {
"$oid": "52d1e62c350e7a3156000009"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
],
"at_position_list": [
],
"body": "the delft university accredited course with the scholarship (fundamentals of water treatment) is supposed to start in about a month's time. But have the scholarship list been published? Any tentative date??",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"author_id": "269835",
"comment_thread_id": {
"$oid": "52cd40c5ab40cf347e00008d"
},
"author_username": "tachak59",
"sk": "52d1e62c350e7a3156000009",
"updated_at": {
"$date": 1389487660636
},
"created_at": {
"$date": 1389487660636
}
}{
"_id": {
"$oid": "52d0a66bcb3eee318d000012"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "52c63278100c07c0d1000028"
}
],
"at_position_list": [
],
"body": "I got it. Thank you!",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "52c63278100c07c0d1000028"
},
"author_id": "2655027",
"comment_thread_id": {
"$oid": "52c4f303b03c4aba51000013"
},
"author_username": "dmoronta",
"sk": "52c63278100c07c0d1000028-52d0a66bcb3eee318d000012",
"updated_at": {
"$date": 1389405803386
},
"created_at": {
"$date": 1389405803386
}
}{
"_id": {
"$oid": "52ceea0cada002b72c000059"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "5287e8d5906c42f5aa000013"
}
],
"at_position_list": [
],
"body": "if u please send by mail \n",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "5287e8d5906c42f5aa000013"
},
"author_id": "2276302",
"comment_thread_id": {
"$oid": "528674d784179607d0000011"
},
"author_username": "totah1993",
"sk": "5287e8d5906c42f5aa000013-52ceea0cada002b72c000059",
"updated_at": {
"$date": 1389292044203
},
"created_at": {
"$date": 1389292044203
}
}
似乎源是MongoDB的。 MongoDB允许导出为CSV或有效的JSON数组(使用mongoexport的'--jsonArray'标志)。也许你的对手可以使用这些选项? – Sebastian
是的......他们可能,但沟通很慢。我怀疑他们会例外,但我可以问任何问题。感谢您的建议,我一定会尝试。 –