从Java中的二进制文件中读取字符串

我已经阅读了我在网上找到的任何页面，但其中没有一个适用于我。从Java中的二进制文件中读取字符串

我有一个用C代码创建的二进制文件。我也有这个二进制文件的C阅读器。我需要为这个二进制文件编写java reader。

在C代码中，以下命令读取一个大小为'b * max_w'的字符串和一个字符。

fscanf(f, "%s%c", &vocab[b * max_w], &ch);

在java中我读的二进制文件，

FileInputStream fis = new FileInputStream(filename); 
BufferedInputStream bin = new BufferedInputStream(fis);

，然后读取字节并将其转换成字符串。

for(int j = 0; j < 200; j++) { 
    int size = 2; // char is 2 bytes 
    byte[] tempId3 = new byte[size]; 
    bin.read(tempId3, 0, size); 
    String id3 = new String (tempId3); 
    System.out.println(" id = " + id3);     
}

但是输出是一堆废话。难道我做错了什么？我可以做得更好吗？

编辑：从here从该运行的C片段是：

fscanf(f, "%lld", &words); 
    fscanf(f, "%lld", &size); 
    vocab = (char *)malloc((long long)words * max_w * sizeof(char)); 
    for (a = 0; a < N; a++) bestw[a] = (char *)malloc(max_size * sizeof(char));

以下是我有：

FileInputStream fis = new FileInputStream(filename); 
BufferedInputStream bin = new BufferedInputStream(fis); 

int length = 1; 

System.out.println("1st: "); 
byte[] tempId = new byte[8]; 
bin.read(tempId, 0, 8); 
String id = new String (tempId, "US-ASCII"); 
System.out.println(" out = " + id); 

System.out.println("2nd: "); 
int size1 = 8; 
byte[] tempId2 = new byte[size1]; 
bin.read(tempId2, 0, size1); 
String id2 = new String (tempId2, "US-ASCII"); 
System.out.println(" out = " + id2); 



for(int j = 0; j < 20; j++) { 
    int size = 2; 
    byte[] tempId3 = new byte[size]; 
    bin.read(tempId3, 0, size); 
    String id3 = new String (tempId3, "US-ASCII"); 
    System.out.println(" out = " + id3);     
}

，我看到的是下面的输出;除了前两个'长'数字，其余都是无稽之谈（预计会是字符）。

output

PS。 C代码是here（第44-60行是读取二进制文件的部分）

来源

2014-01-27 Daniel

'new String（byte []）'构造函数使用系统的默认字符集进行解码。这可能是某种UTF-8，但它可能不是。尝试'System.out.println（System.getProperty（“file.encoding”））;'找出它的设置。我很确定C使用ASCII作为字符（这将与UTF-8兼容），但我不是C程序员。另外，张贴一些废话。培训过的眼睛可能不是无稽之谈。 ; ） – Radiodef

可能使用Reader您可以得到您需要的内容吗？使用InputStream读取二进制数据，读取器用于字符串。

来源

2014-01-27 12:57:56

您可以尝试使用像this one这样的构造函数，并尝试使用不同的字符集。因为一个java字符串以UTF-16编码，所以一个字符以2个字节编码，这可能是为什么它不起作用。尝试使用US-ASCII。

来源

2014-01-27 12:59:55 NitroG42

字符串在Java中是unicode。你必须照顾这一点。您在二进制文件中使用的编码是什么？

来源

2014-01-27 13:01:42

我不知道！（因此，如果二进制文件中的编码不是在C的unicode中，我将无法读取它？C的默认编码是什么？ – Daniel

String id3 = new String(tempId3, "US-ASCII");

来源

2014-01-27 13:04:02 MariuszS

没有帮助，我在上面添加了一些输出。 – Daniel

正如在其他评论中所说的那样，尝试使用带字符编码的String构造函数。那就是：

String id3 = new String(tempId3, Charsets.US_ASCII);

或者：

String id3 = new String(tempId3, "US_ASCII");

，其他线路可能会保持不变。

在您发布的C代码中没有实际的字符读数。只有内存分配用于进一步的扫描过程。

来源

2015-03-04 13:37:30 Dmitry

从Java中的二进制文件中读取字符串

回答

相关问题