需要帮助阅读具有书籍格式的文件

我一直在努力阅读具有书籍格式的文件。该文件被一个字符串分解成页面，如下所示：“------------------------------------ ---”。我试图做的是阅读的所有文字和跟踪页面数和每个单词的单词数量，文件看起来像这样需要帮助阅读具有书籍格式的文件

my file

例如，如果“你好”出现的词在第一页它看起来像这个“你好1,1”，因为它是第一页上的第一个单词，如果该单词出现在第二页，输出将是“你好2,1” 这是我有的代码远

ifstream inFile; 
    inFile.open("GreatExpectations.txt"); 
    if(!inFile.is_open()) { 
     cout << "Error, can't open the file....."<<endl; 
     return 1; 
    } 
    string word; 
    string separator; 
    separator = "----------------------------------------"; 
    int pageNum = 0, wordNum = 0; 
    IndexMap myMap(200000); 
    string title; 
    for(int i = 0; i < 2; i++) { 
     getline(inFile, title); 
     cout << title <<endl; 
    } 
    while(!inFile.eof()) 
    { 
     inFile >> word; 
     //cout << word << " "; 
     wordNum++; 
     if(word == separator) 
      pageNum++; 
    }

来源

2017-06-12 Lolo

你还没有解释你如何挣扎。程序在运行时崩溃了吗？它会产生意外的结果吗？将其他信息添加到帖子中，使其成为[mcve]。 –

它读取文件，但我不知道如何跟踪单词的页面和单词号码，因为它们出现。例如，单词“Biddy”出现在第一页，输出应该是这样的（Biddy 1,1），或者在第二页为了这个单词，它应该输出这个（为了2，6）; – Lolo

将该信息添加到帖子中。此外，添加缺少的代码，以便您有[mcve]，添加观察的输出，以便其他人可以解决问题。 –

如果我很好地理解你的问题，这里是我的问题的方法：

#include <iostream> 
#include <fstream> 
#include <vector> 
#include <sstream> 

using namespace std; 

struct WordInfo { 
    string word; 
    int pageNum; 
    int wordNum; 
}; 

int main() { 
    ifstream inFile; 
    inFile.open("GreatExpectations.txt"); 

    if(!inFile.is_open()) { 
     cout << "Error, can't open the file....."<<endl; 
     return 1; 
    } 

    int pageNum = 1, wordNum = 0; 
    vector<WordInfo> words; // container for words with informations 

    // read the file line-by-line 
    for(string line; getline(inFile, line);) { 
     // detect the page separator which is a line from hyphens only 
     if(line.find_first_not_of("-") == string::npos) { 
      pageNum++; 
      wordNum = 0; 
      continue; 
     } 

     // process the line word-by-word 
     stringstream ss(line); 
     for(string word; getline(ss, word, ' ');) { 
      wordNum++; 
      words.push_back({ word, pageNum, wordNum }); 
     } 
    } 

    return 0; 
}

WordInfo结构将按照您的意愿保存来自单词的信息。这不是最理想的，但是逐行读取文件更简单，因此有两个循环：第一个读取一行，第二个读取该行中的单词。如果读取了一个单词，它将被推入words矢量中供以后使用。就这样。

来源

2017-06-12 19:36:26 Akira

它完美地读取单词，但由于某种原因不会更新到第二页，pageNum不会增加它保持在一个。 – Lolo

在这种情况下，分隔符行在连字符旁边包含其他内容，可能包含一些空格或平台特定行结尾或制表符的剩余部分等。您可以在'find_first_not_of'函数中指定这些函数，例如：'find_first_not_of（“ - \ r \ n \ t“的）'。 – Akira

OHH我的男人非常感谢你，解决了我的问题 – Lolo

需要帮助阅读具有书籍格式的文件

回答

相关问题