阅读文本文件

我想知道什么是最好的方式来阅读C++中的大文本（至少5 MB）文件，考虑速度和效率。任何首选的类或功能使用，为什么？阅读文本文件

顺便说一下，我正在专门在UNIX环境上运行。

2010-01-18 jasonline

我认为你应该指定操作系统，因为它的操作系统具体如何快速读取。例如Windows允许内存映射文件 – 2010-01-18 02:41:17

答案也取决于你打算如何处理文本。 Unix也有内存映射文件。 – Omnifarious 2010-01-18 02:54:19

如果你没有做家庭作业或者做一个需要C++的项目，那么不要在Linux中重新发明轮子，有很多工具（用C/C++完成）读取文件，例如grep，awk等。如果你仍然想在C/C++中做到这一点，你可以检查他们的来源，看看它是如何完成的。 – ghostdog74 2010-01-18 02:56:44

流类（ifstream）实际上做得很好;假设你没有限制，否则请确保关闭sync_with_stdio（在ios_base：:)。您可以使用getline（）直接读入std :: strings，但从性能角度来看，使用固定缓冲区作为char *（chars或old-school char []的向量）可能会更快（风险更高/更复杂）。

如果你愿意玩页面大小计算等游戏，你可以去mmap路线。我可能首先使用流类来构建它，看看它是否足够好。

根据您对每行数据所做的操作，您可能会开始发现处理例程是优化点而不是I/O。

来源

2010-01-18 02:43:10 Joe

对于ifstream，它比fread（）有什么优势？ – jasonline 2010-01-18 02:55:28

表现方面，我希望他们大致相同。在代码维护方面，我宁愿处理流类。 – Joe 2010-01-18 03:37:22

使用旧样式文件io。

fopen the file for binary read 
fseek to the end of the file 
ftell to find out how many bytes are in the file. 
malloc a chunk of memory to hold all of the bytes + 1 
set the extra byte at the end of the buffer to NUL. 
fread the entire file into memory. 
create a vector of const char * 
push_back the address of the first byte into the vector. 
repeatedly 
    strstr - search the memory block for the carriage control character(s). 
    put a NUL at the found position 
    move past the carriage control characters 
    push_back that address into the vector 
until all of the text in the buffer has been processed. 

---------------- 
use the vector to find the strings, 
and process as needed. 
when done, delete the memory block 
and the vector should self-destruct.

来源

2010-01-18 03:19:42 EvilTeach

它比流类更好吗？ – jasonline 2010-01-18 03:22:24

旧式文件io与流是同构的。你可以这样做。这是一次啜食整个文件，并分析重要的字符串。 – EvilTeach 2010-01-18 04:08:14

如果使用文本文件存储整数，浮点数和小弦，我的经验是FILE，fopen，fscanf已经足够快，你也可以直接得到的数字。我认为内存映射是最快的，但它需要你编写代码来解析文件，这需要额外的工作。

来源

2010-01-18 03:34:50

阅读文本文件

回答

相关问题