使用XMLHttpRequest获取非utf8数据

我想用xmlHttpRequest从网上获取文档。然而，有问题的文本不是utf8（在这种情况下，它是windows-1251，但在一般情况下，我肯定不知道）。使用XMLHttpRequest获取非utf8数据

但是，如果我使用responseType="text"，它会将它视为字符串为utf8，忽略内容类型中的字符集（导致令人讨厌的混乱）。

如果我使用'blob'（可能是最接近我想要的东西），然后我可以将其转换为DomString考虑到编码？

来源

2017-10-15 Tom Tanner

'忽略了内容type'你确定charset，服务器正确读取该文件作为窗口-1251，它成为这样，并以正确的内容类型回应？如果这三个中的任何一个失败了，那么甚至在单个字节到达浏览器之前，您最终可能会得到字母汤。 – Thomas

'把那个转换成一个DomString考虑到编码'我没有意识到一个API/lib的，但最坏的情况下，你可以映射每个字节到适当的字符。 – Thomas

我居然发现一个API，它我想要做什么，从这里开始：

https://developers.google.com/web/updates/2014/08/Easier-ArrayBuffer-String-conversion-with-the-Encoding-API

基本上，使用responseType="arraybuffer"，从返回的头挑编码，并使用DataView和TextDecoder。它完全符合要求。

const xhr = new XMLHttpRequest(); 
 
xhr.responseType = "arraybuffer"; 
 
xhr.onload = function() { 
 
    const contenttype = xhr.getResponseHeader("content-type"); 
 
    const charset = contenttype.substring(contenttype.indexOf("charset=") + 8); 
 
    const dataView = new DataView(xhr.response); 
 
    const decoder = new TextDecoder(charset); 
 
    console.log(decoder.decode(dataView)); 
 
} 
 
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt"); 
 
xhr.send(null);

fetch("https://people.w3.org/mike/tests/windows-1251/test.txt") 
 
    .then(response => { 
 
    const contenttype = response.headers.get("content-type"); 
 
    const charset = contenttype.substring(contenttype.indexOf("charset=") + 8); 
 
    response.arrayBuffer() 
 
     .then(ab => { 
 
     const dataView = new DataView(ab); 
 
     const decoder = new TextDecoder(charset); 
 
     console.log(decoder.decode(dataView)); 
 
     }) 
 
    })

来源

2017-10-17 07:34:42

如果我使用'blob'（可能是我想要的最接近的东西），然后我可以将其转换为DomString考虑到编码？

https://medium.com/programmers-developers/convert-blob-to-string-in-javascript-944c15ad7d52概述了您可以使用的一般方法。要申请，要获取远程文件的情况下：

创建FileReader在获取响应为Blob
使用FileReader.readAsText()读取回从Blob文本在正确的编码

像这样：

const reader = new FileReader() 
 
reader.addEventListener("loadend", function() { 
 
    console.log(reader.result) 
 
}) 
 
fetch("https://people.w3.org/mike/tests/windows-1251/test.txt") 
 
    .then(response => response.blob()) 
 
    .then(blob => reader.readAsText(blob, "windows-1251"))

或者，如果你不是真的想用XHR：

const reader = new FileReader() 
 
reader.addEventListener("loadend", function() { 
 
    console.log(reader.result) 
 
}) 
 
const xhr = new XMLHttpRequest() 
 
xhr.responseType = "blob" 
 
xhr.onload = function() { 
 
    reader.readAsText(xhr.response, "windows-1251") 
 
} 
 
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt", true) 
 
xhr.send(null)

但是，如果我用responseType="text"它把它好像字符串是UTF8，忽略字符集content-type

是的。而这正是required by the Fetch spec（这本是个什么XHR规范依赖于太）：

实施 Body混入

对象还有相关联的套餐数据算法，给出字节，一个型和mime类型 ，接通类型，并运行相关的步骤：
...
↪文本
的回报运行结果UTF-8 decode字节。

来源

2017-10-15 10:13:03 sideshowbarker

我错过了提取规范中的注释。谢谢。使用xmlhttprequest的原因是找出编码是什么。 –

使用XMLHttpRequest获取非utf8数据

回答

相关问题