在Java中,我试图解析一个包含复杂文本(如希腊符号)的HTML文件。从BufferedReader到BufferedWriter的字符损坏在java中
当文本包含左对齐的引号时,遇到已知问题。文字如
mutations to particular “hotspot” regions
成为
mutations to particular “hotspot�? regions
我已经通过书面方式一个简单的文本拷贝meathod隔离问题:
public static int CopyFile()
{
try
{
StringBuffer sb = null;
String NullSpace = System.getProperty("line.separator");
Writer output = new BufferedWriter(new FileWriter(outputFile));
String line;
BufferedReader input = new BufferedReader(new FileReader(myFile));
while((line = input.readLine())!=null)
{
sb = new StringBuffer();
//Parsing would happen
sb.append(line);
output.write(sb.toString()+NullSpace);
}
return 0;
}
catch (Exception e)
{
return 1;
}
}
人都可以提供一些建议,如何解决这个问题?
★我的解决方案
InputStream in = new FileInputStream(myFile);
Reader reader = new InputStreamReader(in,"utf-8");
Reader buffer = new BufferedReader(reader);
Writer output = new BufferedWriter(new FileWriter(outputFile));
int r;
while ((r = reader.read()) != -1)
{
if (r<126)
{
output.write(r);
}
else
{
output.write("&#"+Integer.toString(r)+";");
}
}
output.flush();
是只是我还是“缓冲”读者过时在最后一个片段? – 2013-05-29 01:40:37