2016-09-10 95 views
2

我有一个字符串,其中包含UTF-32(但可能更高的16位将始终为0)代码点。每个标记是长字符串中每个字符的代码点的4个字节中的1个。 请注意,在转换为字符串之前,将字节解释为signed int,我无法控制此字符串。JavaScript:如何将多字节字符串数组转换为32位int数组?

// Provided: 
    intEncodedBytesString= "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 chars: áñá 

    // Wanted 
    actualCodePoints = [225,241,225]; 

我需要将intEncodedBytesString转换为actualCodePoints数组。 到目前为止,我想出了这一点:

var intEncodedBytesStringArray = intEncodedBytesString.toString().split(','); 
var i, str = ''; 
var charAmount = intEncodedBytesStringArray.length/4; 

for (i = 0; i < charAmount; i++) { 
    var codePoint = 0; 

    for (var j = 0; j < 4; j++) { 
    var num = parseInt(intEncodedBytesStringArray[i * 4 + j], 10); 
    if (num != 0) { 
     if (num < 0) { 
     num = (1 << (8 * (4 - j))) + num; 
     } 

     codePoint += (num << (8 * (3 - j))); 
    } 
    } 

    str += String.fromCodePoint(codePoint); 
} 

是否有这样做的更好的,更简单的和/或更有效的方式?

我已经看到了几十个答案和代码snipets来处理类似的事情,但没有解决这个问题,我的输入字节在签署整数的字符串:S

编辑:此代码不会以最高的工作代码点自1 < < 32是1而不是2^32。

+0

@ T.J.Crowder事实上,UTF-32。编辑补充说。 – TigerShark

回答

1

既然是简单的UTF-32,不错,有一种更简单的方法:只用四字节块。此外,处理可能的负面影响的简单方法是(value + 256) % 256

所以:

var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char 
var actualCodePoints = []; 
var bytes = intEncodedBytesString.split(",").map(Number); 
for (var i = 0; i < bytes.length; i += 4) { 
    actualCodePoints.push(
     (((bytes[i]  + 256) % 256) << 24) + 
     (((bytes[i + 1] + 256) % 256) << 16) + 
     (((bytes[i + 2] + 256) % 256) << 8) + 
     (bytes[i + 3] + 256) % 256 
); 
} 

与详细的说明实施例中的注释:

// Starting point 
 
var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char 
 
// Target array 
 
var actualCodePoints = []; 
 
// Get the bytes as numbers by splitting on comman running the array 
 
// through Number to convert to number. 
 
var bytes = intEncodedBytesString.split(",").map(Number); 
 

 
// Loop through the bytes building code points 
 
var i, cp; 
 
for (i = 0; i < bytes.length; i += 4) { 
 
    // (x + 256) % 256 will handle turning (for instance) -31 into 224 
 
    // We shift the value for the first byte left 24 bits, the next byte 16 bits, 
 
    // the next 8 bits, and don't shift the last one at all. Adding them all 
 
    // together gives us the code point, which we push into the array. 
 
    cp = (((bytes[i]  + 256) % 256) << 24) + 
 
     (((bytes[i + 1] + 256) % 256) << 16) + 
 
     (((bytes[i + 2] + 256) % 256) << 8) + 
 
     (bytes[i + 3] + 256) % 256; 
 
    actualCodePoints.push(cp); 
 
} 
 

 
// Show the result 
 
console.log(actualCodePoints); 
 

 
// If the JavaScript engine supports it, show the string 
 
if (String.fromCodePoint) { // ES2015+ 
 
    var str = String.fromCodePoint.apply(String, actualCodePoints); 
 
    // The above could be 
 
    // `let str = String.fromCodePoint(...actualCodePoints);` 
 
    // on an ES2015+ engine 
 
    console.log(str); 
 
} else { 
 
    console.log("(Your browser doesn't support String.fromCodePoint)"); 
 
}

相关问题