在Mac上将.json文件拆分成多个文件

我在一个mac上运行，并有一个非常大的包含超过100k对象的.json文件。在Mac上将.json文件拆分成多个文件

我想将文件分成许多文件（最好是50-100）。

源文件

原来以.json文件是一个多维数组，看起来有点像这样：

[{ 
    "id": 1, 
    "item_a": "this1", 
    "item_b": "that1" 
}, { 
    "id": 2, 
    "item_a": "this2", 
    "item_b": "that2" 
}, { 
    "id": 3, 
    "item_a": "this3", 
    "item_b": "that3" 
}, { 
    "id": 4, 
    "item_a": "this4", 
    "item_b": "that4" 
}, { 
    "id": 5, 
    "item_a": "this5", 
    "item_b": "that5" 
}]

所需的输出

如果这被分成三个文件我想输出看起来像这样：

文件1：

[{ 
    "id": 1, 
    "item_a": "this1", 
    "item_b": "that1" 
}, { 
    "id": 2, 
    "item_a": "this2", 
    "item_b": "that2" 
}]

文件2：

[{ 
    "id": 3, 
    "item_a": "this3", 
    "item_b": "that3" 
}, { 
    "id": 4, 
    "item_a": "this4", 
    "item_b": "that4" 
}]

文件3：

[{ 
    "id": 5, 
    "item_a": "this5", 
    "item_b": "that5" 
}]

任何想法将不胜感激。谢谢！

来源

2016-07-26 Brandon

Perl来救援：

#!/usr/bin/perl 
use warnings; 
use strict; 

use JSON; 

my $file_count = 5; # You probably want 50 - 100 here. 

my $json_text = do { 
    local $/; 
    open my $IN, '<', '1.json' or die $!; 
    <$IN> 
}; 
my $arr = decode_json($json_text); 
my $size = @$arr/$file_count; 
my $rest = @$arr % $file_count; 

my $i = 1; 
while (@$arr) { 
    open my $OUT, '>', "file$i.json" or die $!; 
    my @chunk = splice @$arr, 0, $size; 
    ++$size if $i++ >= $file_count - $rest; 
    print {$OUT} encode_json(\@chunk); 
    close $OUT or die $!; 
}

来源

2016-07-26 08:00:04 choroba

@ choroba的答案是非常有效和灵活。我有一个bash解决方案jq。

#!/bin/bash 
i=0 
file=0 
for f in `cat data.json | jq -c -M '.[]'`; 
do 

    if [ $i -eq 2 ]; then 

     ret=`jq --slurp "." /tmp/0.json /tmp/1.json > File$file.json`; 
     ret=`rm /tmp/0.json /tmp/1.json`; #cleanup 

     ((file = file + 1)); 
    i=0 
    fi 
    ret=`echo $f > /tmp/$i.json`; 
    ((i = i + 1)); 
done 
if [ -f /tmp/0.json ]; then 
    ret=`jq --slurp '.' /tmp/0.json > File$file.json`; 
    ret=`rm /tmp/0.json`; #cleanup 
fi

来源

2016-07-26 08:48:37 sozkul

$ cat tst.awk 
/{/ && (++numOpens % 2) { 
    if (++numOuts > 1) { 
     print out, "}]" 
     close(out) 
    } 
    out = "out" numOuts 
    $0 = "[{" 
} 
{ 
    # print > out 
    print out, $0 
}

。

$ awk -f tst.awk file 
out1 [{ 
out1  "id": 1, 
out1  "item_a": "this1", 
out1  "item_b": "that1" 
out1 }, { 
out1  "id": 2, 
out1  "item_a": "this2", 
out1  "item_b": "that2" 
out1 }] 
out2 [{ 
out2  "id": 3, 
out2  "item_a": "this3", 
out2  "item_b": "that3" 
out2 }, { 
out2  "id": 4, 
out2  "item_a": "this4", 
out2  "item_b": "that4" 
out2 }] 
out3 [{ 
out3  "id": 5, 
out3  "item_a": "this5", 
out3  "item_b": "that5" 
out3 }]

只是删除print out, $0并取消# print > out你测试后是满意的。

来源

2016-07-26 14:28:47

谢谢你，Ed。我认为这非常接近。它在测试时在我的终端中正确打印，但是当我删除'print out，$ 0'并取消注释'＃print $ 0> out'时，out1和out2的末尾将被打印在终端中，但不包含在文件中。 '}]'被截断，只是在终端打印。任何想法如何解决？谢谢！ – Brandon

您必须复制/粘贴错误或未注释的错误。我发布的脚本**不会执行您所描述的内容。如果您编辑问题以显示您正在运行的脚本，我们可以帮助您进行调试。 –

如果任何键或值包含“{”字符，则这将失败。 –

在Mac上将.json文件拆分成多个文件

回答

相关问题