Java从UTF-16LE解析XML字符串

我想解析嵌入在文件中的UTF-16LE XML字符串。我能够将实际的字符串读入一个String对象，并且可以在监视窗口中查看XML，并且它看起来很好。问题是，当我尝试解析它时，异常不断抛出。我试图在getBytes行和InputStreamReader构造函数中指定UTF-16和UTF-16LE，但它仍会抛出异常。Java从UTF-16LE解析XML字符串

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); 
DocumentBuilder builder = null; 

builder = builderFactory.newDocumentBuilder();  
Document document = null; 
byte[] bytes = xmlString.getBytes(); 
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes); 
InputSource is = new InputSource(new InputStreamReader(inputStream)); 
document = builder.parse(is); // throws SAXParseException

编辑：这是使用Android。此外，这是我在STACK TRACE顶部的例外情况：

12-18 13：51：12.978：W/System.err（5784）：org.xml.sax.SAXParseException：name expected（position ：START_TAG @ 1：2 in [email protected]） 12-18 13：51：12.978：W/System.err（5784）：at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse（DocumentBuilderImpl在javax.xml.parsers.DocumentBuilder.parse（DocumentBuilder.java:107）

来源

2012-12-17 rplankenhorn

什么是wrmHeaderXml？一个字符串，一个对象还是waht？看来你是从字节转换为字符，然后再从字符转换为字节。为什么？如果你已经得到了这些字节，只要将它提供给InputSource（InputStream） – leonbloy

我想这是一个字符串。如果你有一个String对象（你声明你可以在控制台中查看它）比内部编码没有关系，因为它是一个Java String – Raffaele

这是我结束了。 up做：

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); 
DocumentBuilder builder = null; 

builder = builderFactory.newDocumentBuilder();  
Document document = null; 
byte[] bytes = Charset.forName("UTF-16LE").encode(xmlString).array(); 
InputStream inputStream = new ByteArrayInputStream(bytes); 
document = builder.parse(inputStream);

来源：How does one create an InputStream from a String?

来源

2012-12-17 19:18:36 rplankenhorn

对String进行编码的目的是什么？ – Raffaele

它我只是调用xmlString.getBytes并将其传递到ByteArrayInputStream，然后它会抛出SAXParseException。 – rplankenhorn

但是，为什么你需要从字符串中提取字节呢？只要传递['StringReader']（http://docs.oracle.com/javase/6/docs/api/java/io/StringReader.html）到'InputSource' ctor – Raffaele

在同一个程序中，不需要在字符串和字节之间来回转换。它就像一样容易：

String xml = "<root><tag>Hello World!</tag></root>"; 

Document dom = DocumentBuilderFactory.newInstance() 
    .newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

来源

2012-12-17 22:37:25 Raffaele

这会在分析行上抛出一个SAXParseException异常。 – rplankenhorn

不需要粗鲁。当我尝试使用上面的解析行和我解析的XML时，会引发SAXParseException。我发布了上面的STACK TRACE的顶部。如果我只调用xmlString.getBytes（）并查看二进制数据，那么它是UTF-16LE编码。前两个字节是0xFF 0xFE，它告诉我它是小端的UTF-16编码。 – rplankenhorn

@rplankenhorn听起来像你的'xmlString'实际上包含了BOM作为它的第一个字符。如果你将这个第一个字符从字符串中剥离出来，然后从结果中创建一个StringReader，那么它应该从没有来回字节的字符串中解析出来。 –

Java从UTF-16LE解析XML字符串

回答

相关问题