2017-08-17 37 views
4

我正在读取文件orderedfile.txt中的数据。有时,这种文件的形式的报头:跳过C中文本文件的标题头

BEGIN header 

     Real Lattice(A)    Lattice parameters(A) Cell Angles 
    2.4675850 0.0000000 0.0000000  a = 2.467585 alpha = 90.000000 
    0.0000000 30.0000000 0.0000000  b = 30.000000 beta = 90.000000 
    0.0000000 0.0000000 30.0000000  c = 30.000000 gamma = 90.000000 

1       ! nspins 
25 300 300    ! fine FFT grid along <a,b,c> 
END header: data is "<a b c> pot" in units of Hartrees 

1  1  1   0.042580 
1  1  2   0.049331 
1  1  3   0.038605 
1  1  4   0.049181 

有时无标头存在并且在第一行中的数据开始。我的数据读取代码如下所示。它在数据从第一行开始时起作用,但不在头中出现。有没有办法解决这个问题?

int readinputfile() { 
    FILE *potential = fopen("orderedfile.txt", "r"); 
    for (i=0; i<size; i++) { 
     fscanf(potential, "%lf %lf %*f %lf", &x[i], &y[i], &V[i]); 
    } 
    fclose(potential); 
} 
+3

切换到读取整行。这允许您检测标题,然后读取,直到数据开始。 – Yunnosch

回答

2

下面的代码将使用fgets()阅读每一行。对于每行sscanf()用于扫描字符串并将其存储到双变量中。
查看正在运行的example (with stdin) at ideone

#include <stdio.h> 

int main() 
{ 
    /* maybe the buffer must be greater */ 
    char lineBuffer[256]; 
    FILE *potential = fopen("orderedfile.txt", "r"); 

    /* loop through every line */ 
    while (fgets(lineBuffer, sizeof(lineBuffer), potential) != NULL) 
    { 
     double a, b, c; 
     /* if there are 3 items matched print them */ 
     if (3 == sscanf(lineBuffer, "%lf %lf %*f %lf", &a, &b, &c)) 
     { 
     printf("%f %f %f\n", a, b, c); 
     } 
    } 
    fclose(potential); 

    return 0; 
} 

它正在与您提供的头,但如果在标题行,例如:

1  1  2   0.049331 

会出现那么这行也将被读取。如果BEGIN header存在于您给定的标题中,或者在行数已知的情况下使用行计数,则另一种可能性是搜索单词END header
要搜索子串,可以使用功能strstr()

2

检查返回值fscanf。如果它返回三,你的输入是正确的;否则,你仍然在头,所以你必须跳过行:

int readinputfile() { 
    FILE *potential = fopen("orderedfile.txt", "r"); 
    int res; 
    while(res = fscanf(potential, "%lf %lf %*f %lf", &x[i], &y[i], &V[i])) { 
     if (res != 3) { 
      fscanf(potential, "%*[^\n]"); 
      continue; 
     } 
     i++; 
     ... // Optionally, do anything else with the data that you read 
    } 
    fclose(potential); 
} 

Demo.

+1

@chqrlie当然 - 功能不是那么大,所以我添加了其余部分。谢谢! – dasblinkenlight

+0

最后的解决方法是添加i = i - 1项,以在跳过行时停止循环增量! –

2

我认为这是一个很多更可靠,明确查找标头的开始和结束的比它是依赖于以往任何时候都匹配scanf()风格的格式字符串头没有字符串:

FILE *fp = fopen(...); 

int inHeader = 0; 

size_t lineLen = 128; 
char *linePtr = malloc(lineLen); 

// skip header lines 
while (getline(&linePtr, &lineLen, fp) >= (ssize_t) 0) 
{ 
    // check for the start of the header (need to do this first to 
    // catch the first line) 
    if (!inHeader) 
    { 
     inHeader = !strncmp(linePtr, "BEGIN header", strlen("BEGIN header")); 
    } 
    else 
    { 
     // if we were in the header, check for the end line and go to next line 
     inHeader = strncmp(linePtr, "END header", strlen("END header")); 

     // need to skip this line no matter what because it's in the header 
     continue; 
    } 

    // if we're not in the header, either break this loop 
    // which leaves the file at the first non-header line, 
    // or process the line in this loop 
    if (!inHeader) 
    { 
     ... 
    } 
} 
... 

你可能更喜欢使用strstr()而不是strncmp()。这样头标开始/结束字符串不必开始行。

+1

为什么'malloc()'调用? 'size_t lineLen = 0; char * linePtr = NULL;'对POSIX.1 ['getline()']来说是完全正确的(http://man7.org/linux/man-pages/man3/getline.3.html)。 –

+0

@NominalAnimal *为什么'malloc()'调用?*只是为了避免线长度缓慢增长缓冲区多个调用。没有什么是真正重要的。 –