我使用下面的脚本下载中文老师,但是当我运行它时,我得到的文件与该URL中的文件不同。我认为这是一个编码问题,但正如我指定的UTF-8,我不知道发生了什么。python请求从Google Translate下载不正确的声音文件
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
url = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"
r = requests.get(url)
with open('test.mp3', 'wb') as test:
test.write(r.content)
UPDATE:
按@ abarnert的建议,我已经检查该文件是UTF-8 BOM和测试与 'IDNA' 的代码。
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
url_1 = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"
url_2 = "http://translate.google.com/translate_tts?tl=zh-CN&q=\u8001\u5e2b"
r_1 = requests.get(url_1)
r_1_b = requests.get(url_1.encode('idna'))
r_2 = requests.get(url_2)
r_2_b = requests.get(url_2.encode('idna'))
# This downloads nonsense:
with open('r_1.mp3', 'wb') as test:
test.write(r_1.content)
# This throws the error specified at bottom:
with open('r_1_b.mp3', 'wb') as test:
test.write(r_1_b.content)
# This parses the characters individually, producing
# a file consisting of "u, eight, zero..." in Mandarin
with open('r_2.mp3', 'wb') as test:
test.write(r_2.content)
# This produces a sound file consisting of "u, eight, zero, zero..." in Mandarin
with open('r_2_b.mp3', 'wb') as test:
test.write(r_2_b.content)
我得到的错误是:
Traceback (most recent call last):
File "/home/MZ/Desktop/tts3.py", line 12, in <module>
r_1_b = requests.get(url_1.encode('idna'))
File "/usr/lib64/python2.7/encodings/idna.py", line 164, in encode
result.append(ToASCII(label))
File "/usr/lib64/python2.7/encodings/idna.py", line 76, in ToASCII
label = nameprep(label)
File "/usr/lib64/python2.7/encodings/idna.py", line 21, in nameprep
newlabel.append(stringprep.map_table_b2(c))
File "/usr/lib64/python2.7/stringprep.py", line 197, in map_table_b2
b = unicodedata.normalize("NFKC", al)
TypeError: must be unicode, not str
[Finished in 15.3s with exit code 1]
你在哪里指定了UTF-8?不在您的代码中,您的网址,您的源文件编码或任何我能看到的东西。 – abarnert
另外,这是Python 2还是3? – abarnert
对不起,我忽略了标题。我已经在2和3中试过了。 – zadrozny