2014-02-24 51 views
0

我正在使用SAX进行解析,然后写入XML文件。SAX:UTF-8解码/编码失败

解析和写入过程都会破坏UTF编码。

示例XML在:

<AddressInfo> 
    <City name="Antalya" code="07"> 
     <District name="Döşemealtı"> 
     <Zip code="01680" /> 
     </District> 
    </City> 
<AddressInfo> 

结果:

<AddressInfo> 
    <City name="Antalya" code="07"> 
     <District name="Döşemealtı"> 
      <Zip code="01680"/> 
     </District> 
    </City> 
<AddressInfo> 

我试图指定的InputStreamReader和饲料的SAXParser的InputSource, 它没有工作:

SAXParserFactory parserFactor = SAXParserFactory.newInstance(); 
    SAXHandler handler = new SAXHandler();  
    SAXParser parser; 
try { 
     //dis is a DataInputStream 
     parser = parserFactor.newSAXParser();  
     InputStreamReader inputReader = new InputStreamReader(dis, Charset.forName("UTF-8")); 
     InputSource inputSource = new InputSource(); 
     inputSource.setCharacterStream(inputReader); 
     inputSource.setEncoding("UTF-8"); 
    //ignoring the inputsource and using directly the DataInputStream 
     parser.parse(dis, handler); 
    //also tried with inputSource, no joy 
    //parser.parse(inputSource, handler); 

....

什么可能会出错?有任何想法吗?

干杯

注: 输入XML没有任何声明,如

`<?xml version="1.0" encoding="UTF-8"?>` 
+0

什么是您的文件的原始编码? (和Utility.getEncoding()返回什么?) – helderdarocha

+0

啊,编辑:utf 8 –

回答

0

尝试读取输入的字符流并使用输入源编码。 UTF-8需要逐字阅读。 InputStream不能被编码为UTF-8。

这样的事情会帮助你。 如果您正在解析XML,请确保您将clob读为Something.getcharacterStream();

Reader F; 
F=clob.getcharacterStream(); (If getting clob from database make sure you are reading it as character stream) 
BuffeReader Readfile = new BufferReader(F); 
InputSource Encode = new InputSource(Readfile); 
Encode.setEncoding("UTF-8");