特殊字符替换

我有文字： “ 打嗝gyötörheti攻击瑞典人，所以最近多再次提到匈牙利瑞典模式”。特殊字符替换

在伤害所有原文有没有换行的。

当我发邮件本文（Gmail帐户），我得到它的编码为以下几点：

Content-Type: text/plain; charset=ISO-8859-2 
Content-Transfer-Encoding: quoted-printable 

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g = 
ism=E9t a 
sv=E9d modellt Magyarorsz=E1gon.

在HTML：

Content-Type: text/html; charset=ISO-8859-2 
Content-Transfer-Encoding: quoted-printable 


<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.

....

当我尝试解析电子邮件正文作为text/plain的，我不能在“mostans = E1G = 主义= E9t”摆脱了等号（=）两个词之间。需要注意的是相同的字符从HTML编码的邮件丢失。我没有任何想法可能受到伤害的特殊字符，但我需要找回消除原有的文本。

我试图取代“\ n”，但它不是一个伤害，如果我打的文字“回车”，我可以正确地更换到任何角色，我想它。我也试过，有 '\ r'， '\ t'。

因此问题是，我缺少什么？哪里疼特殊字符从何而来？是不是因为charser和/或传输编码的？如果是的话，做我必须做的，解决问题，找回原来的文本。

任何帮助将受到欢迎。

干杯，巴拉兹

来源

2010-11-11 Balázs Mária Németh

您需要使用MimeUtility。这里就是一个例子。

public class Mime { 
    public static void main(String[] args) throws MessagingException, 
      IOException { 
     InputStream stringStream = new FileInputStream("mime"); 
     InputStream output = MimeUtility.decode(stringStream, 
       "quoted-printable"); 
     System.out.println(convertStreamToString(output)); 
    } 

    public static String convertStreamToString(InputStream is) 
      throws IOException { 
     /* 
     * To convert the InputStream to String we use the Reader.read(char[] 
     * buffer) method. We iterate until the Reader return -1 which means 
     * there's no more data to read. We use the StringWriter class to 
     * produce the string. 
     */ 
     if (is != null) { 
      Writer writer = new StringWriter(); 

      char[] buffer = new char[1024]; 
      try { 
       Reader reader = new BufferedReader(new InputStreamReader(is, 
         "ISO8859_1")); 
       int n; 
       while ((n = reader.read(buffer)) != -1) { 
        writer.write(buffer, 0, n); 
       } 
      } finally { 
       is.close(); 
      } 
      return writer.toString(); 
     } else { 
      return ""; 
     } 
    } 
}

文件'哑剧'包含编码的文本：

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g = 
ism=E9t a 
sv=E9d modellt Magyarorsz=E1gon.

UPDATE：

使用Guava库：

InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() { 
     @Override 
     public InputStream getInput() throws IOException { 
      InputStream inStream = new FileInputStream("mime"); 
      InputStream decodedStream=null; 
      try { 
       decodedStream = MimeUtility.decode(inStream, 
       "quoted-printable"); 
      } catch (MessagingException e) { 
       e.printStackTrace(); 
      } 
      return decodedStream; 
     } 
    }; 
    InputSupplier<InputStreamReader> result = CharStreams 
    .newReaderSupplier(supplier, Charsets.ISO_8859_1); 
    String ans = CharStreams.toString(result); 
    System.out.println(ans);

来源

2010-11-11 12:50:13 Emil

所以在“输出”不必要的“=“s已消除？ @玛丽 – 2010-11-11 13:57:49

巴拉兹内梅特：消除是的，它是，但我看到一个额外的换行符这是不存在原text.Maybe像它说什么jarnbjo伤“引用可打印”不允许对编码行超过76个字符的长度。 @玛丽 – Emil 2010-11-11 14:09:20

巴拉兹内梅特：阅读[引用可打印]（http://en.wikipedia.org/wiki/Quoted-printable）。这将帮助你了解了编码。 – Emil 2010-11-11 14:12:58

转印编码是“引用可打印”不允许对编码行至超过76个字符的长度。如果要编码的文本包含较长的文本行，“软线断线”必须被插入，它是由单一的“=”作为编码行的最后一个字符表示。它伤害了以下手段换行符只能插入到履行了76字符限制和伤害下面的换行符应删除当解码传输编码。

来源

2010-11-11 12:30:01 jarnbjo

要添加到这一点，行可能盈亏平衡“\ r \ n” 个，不只是为 “\ r” 或 “\ n” 个。它不仅 – 2010-11-11 12:43:39

可能的，但强制性的。只有CRLF（\ r \ n）的换行符被允许在引用可打印。 – jarnbjo 2010-11-11 12:48:37

特殊字符替换

回答

相关问题