Python I/O：混合数据类型

我正在写一个小脚本，它将一个目录中的大量JSON文件合并到一个文件中。麻烦的是，我不完全确定我的数据处于何种状态。类型错误比比皆是。这是脚本;Python I/O：混合数据类型

import glob 
import json 
import codecs 

reader = codecs.getreader("utf-8") 

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"): 
#Aha, as binary here 
with open(file, "rb") as infile: 
    data = json.load(reader(infile)) 
    #If I print(data) here, looks like good ol' JSON 

    with open("test.json", "wb") as outfile: 
     json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False) 
    #Crash

此脚本导致以下错误;

TypeError: a bytes-like object is required, not 'str'

这是由json.dump行引起的。

天真的我只是删除'wb'中的'b'outfile打开。这并不能解决问题。

也许这是我使用shell进行测试以及使用type（）python函数的教训。不过，如果有人能够为我清除这些数据交换背后的逻辑，我很乐意。我希望它可以都是字符串...

来源

2016-08-18 Typhon

当您移除“b”时发生了什么？也许你得到了一个*不同的错误？ –

此外，这是Python 2还是Python 3？ –

@MartijnPieters好吧，Martijn，我会告诉你当我在'wb'中删除'b'时会发生什么。有用。当我尝试这个时，我一定有另一个错误。谢谢你的明智问题！这是python 3 – Typhon

如果这是Python 3，删除b（二进制模式）打开文件在文本模式应该工作得很好。您可能要明确指定编码：

with open("test.json", "w", encoding='utf8') as outfile: 
    json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

而不是依赖于默认值。

你不应该真的使用codecs.getreader()。标准的open()函数可以很好地处理UTF-8文件;只是再次打开在文本模式下的文件，并指定编码：

import glob 
import json 

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"): 
    with open(file, "r", encoding='utf8') as infile: 
     data = json.load(infile) 
     with open("test.json", "w", encoding='utf8') as outfile: 
      json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

以上仍然会重新创建test.json在*.json水珠每个文件;您无法将多个JSON文档放在同一个文件中（除非您专门创建JSONLines files，因为您使用的是indent，所以您不在这里执行）。

如果要重新格式化glob中的所有JSON文件，则需要写入新文件名并将新名称移回file文件名。

来源

2016-08-18 14:20:12

Python I/O：混合数据类型

回答

相关问题