如何在Go中将编码转换为UTF-8？

我正在开发一个项目，我需要将文本从编码（例如Windows-1256 Arabic）转换为UTF-8。如何在Go中将编码转换为UTF-8？

如何在Go中执行此操作？

2015-09-11 Ali Bahraminezhad

你的意思是*编码*？只有*一个* Unicode，阿拉伯语1256不是“一个Unicode”。 – deceze

你是对的，我们编辑了这个问题。谢谢。 –

您可以使用the encoding package，其中包括通过包golang.org/x/text/encoding/charmap（在下面的示例中，导入此包并使用charmap.Windows1256而不是japanese.ShiftJIS）支持Windows-1256。

下面是一个简短的例子，它将日语UTF-8字符串编码为ShiftJIS编码，然后将ShiftJIS字符串解码回UTF-8。不幸的是，由于操场上没有“x”包装，因此它不适用于操场。

package main 

import (
    "bytes" 
    "fmt" 
    "io/ioutil" 
    "strings" 

    "golang.org/x/text/encoding/japanese" 
    "golang.org/x/text/transform" 
) 

func main() { 
    // the string we want to transform 
    s := "今日は" 
    fmt.Println(s) 

    // --- Encoding: convert s from UTF-8 to ShiftJIS 
    // declare a bytes.Buffer b and an encoder which will write into this buffer 
    var b bytes.Buffer 
    wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder()) 
    // encode our string 
    wInUTF8.Write([]byte(s)) 
    wInUTF8.Close() 
    // print the encoded bytes 
    fmt.Printf("%#v\n", b) 
    encS := b.String() 
    fmt.Println(encS) 

    // --- Decoding: convert encS from ShiftJIS to UTF8 
    // declare a decoder which reads from the string we have just encoded 
    rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder()) 
    // decode our string 
    decBytes, _ := ioutil.ReadAll(rInUTF8) 
    decS := string(decBytes) 
    fmt.Println(decS) 
}

在日语的StackOverflow网站上有一个更完整的例子。该文本是日语，但代码应该不言自明：https://ja.stackoverflow.com/questions/6120

来源

2015-09-11 08:13:19 rob74

我找不到一个将编码转换为另一个的实例，在dot net中这样做很简单，但在这里我真的很新鲜。 –

大活的例子。嗯，所以这里我们试图从UTF8转换到日本SHIFTJIS，是否有可能做到这一点呢？ –

要解码ShiftJIS，请使用第二部分，从“声明解码器...”开始，encS是您希望解码的字符串，string（decBytes）是解码后的字符串。也许两种功能会更好，但我想尽可能缩短范例... – rob74

使用模块从golang.org/x/text。在你的情况下，这将是这样的：

b := /* Win1256 bytes here. */ 
dec := charmap.Windows1256.NewDecoder() 
// Take more space just in case some characters need 
// more bytes in UTF-8 than in Win1256. 
bUTF := make([]byte, len(b)*3) 
n, _, err := dec.Transform(bUTF, b, false) 
if err != nil { 
    panic(err) 
} 
bUTF = bUTF[:n]

来源

2015-09-11 09:25:27

我并不擅长Go，但大致分配一个缓冲区似乎是一个糟糕的主意。理论上UTF-8可能是输入字符串大小的四倍（尽管实际上可能永远不会）。 – deceze

这只是一个例子。 Win1256中的大多数字符会[占用两个字节]（http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1256.txt），而且不会超过三个字节。编辑。 –

有一个确定性的方法来确定缓冲区大小，而不是通过猜测。 @ rob74的答案似乎显示了这样一种方式。 – deceze

如何在Go中将编码转换为UTF-8？

回答

相关问题