Java：分两部分读取文件 - 部分为字符串，部分为字节[]

我有一个文件，它被两部分分割为“\ n \ n” - 第一部分不是太长字符串，第二部分是字节数组，这可能会很长。Java：分两部分读取文件 - 部分为字符串，部分为字节[]

我试图读取该文件，如下所示：尽管

byte[] result; 
    try (final FileInputStream fis = new FileInputStream(file)) { 

     final InputStreamReader isr = new InputStreamReader(fis); 
     final BufferedReader reader = new BufferedReader(isr); 

     String line; 
     // reading until \n\n 
     while (!(line = reader.readLine()).trim().isEmpty()){ 
      // processing the line 
     } 

     // copying the rest of the byte array 
     result = IOUtils.toByteArray(reader); 
     reader.close(); 
    }

结果数组都是应该的大小，它的内容被破坏。如果我试图直接在fis或isr上使用toByteArray，则结果内容为空。

如何正确高效地读取文件的其余部分？

谢谢！

来源

2013-02-27 Vojtěch

感谢所有的意见 - 最终实现在这个工作方式：

try (final FileInputStream fis = new FileInputStream(file)) { 

     ByteBuffer buffer = ByteBuffer.allocate(64); 

     boolean wasLast = false; 
     String headerValue = null, headerKey = null; 
     byte[] result = null; 

     while (true) { 
      byte current = (byte) fis.read(); 
      if (current == '\n') { 
       if (wasLast) { 
        // this is \n\n 
        break; 
       } else { 
        // just a new line in header 
        wasLast = true; 
        headerValue = new String(buffer.array(), 0, buffer.position())); 
        buffer.clear(); 
       } 
      } else if (current == '\t') { 
       // headerKey\theaderValue\n 
       headerKey = new String(buffer.array(), 0, buffer.position()); 
       buffer.clear(); 
      } else { 
       buffer.put(current); 
       wasLast = false; 
      } 
     } 
     // reading the rest 
     result = IOUtils.toByteArray(fis); 
    }

来源

2013-02-27 07:04:04

如果你还在'if（current =='\ t'）'块内部放置了'wasLast = false;'，以防万一遇到一个空的键值对导致'... \ n \ t \ n ...'？ :) – 2013-03-01 17:10:36

内容被破坏的原因是因为IOUtils.toByteArray(...)函数以默认字符编码中的字符串形式读取数据，即它使用默认编码规定的任何逻辑将8位二进制值转换为文本字符。这通常会导致许多二进制值被破坏。

根据字符集究竟怎么实现的，有轻微的机会，这可能工作：

result = IOUtils.toByteArray(reader, "ISO-8859-1");

ISO-8859-1仅使用每个字符一个字节。并非所有的字符值都已定义，但许多实现都会通过它们。也许你很幸运。

但是一个更简洁的解决方案是先读取字符串作为二进制数据，然后通过new String(bytes)将其转换为文本，而不是以字符串的形式读取二进制数据，然后将其转换回来。

虽然这可能意味着您需要实现您自己的BufferedReader版本以达到性能目的。

您可以通过明显的谷歌搜索，这将（例如）带领你在这里标准的BufferedReader的源代码：

http://www.docjar.com/html/api/java/io/BufferedReader.java.html

这是一个有点长，但概念不是太难理解，所以希望它可以作为参考。

来源

2013-02-27 05:28:37

这是exaclty我发现自己几分钟前:-) – 2013-02-27 07:01:04

另外，您可以读取文件到字节数组，找到\ n \ n位置和阵列分成行和字节

byte[] a = Files.readAllBytes(Paths.get("file")); 
    String line = ""; 
    byte[] result = a; 
    for (int i = 0; i < a.length - 1; i++) { 
     if (a[i] == '\n' && a[i + 1] == '\n') { 
      line = new String(a, 0, i); 
      int len = a.length - i - 1; 
      result = new byte[len]; 
      System.arraycopy(a, i + 1, result, 0, len); 
      break; 
     } 
    }

来源

2013-02-27 05:55:31

我认为阵列副本会相当昂贵。 – 2013-02-27 07:04:37

Java：分两部分读取文件 - 部分为字符串，部分为字节[]

回答

相关问题