从长文本解析文件在C

-4

<td>&nbsp;<a href="wtzresult.php?CiID=41832&forma=12h"> Asmara </a>&nbsp;</td><td width=``"100">Sun, 09:08 PM</td></tr><tr> 
<td>&nbsp;<a href="wtzresult.php?CiID=42107&forma=12h"> Astana </a>&nbsp;</td><td width="100">Mon, 12:08 AM</td></tr><tr bgcolor="#E0E0E0"> 
<td>&nbsp;<a href="wtzresult.php?CiID=4698&forma=12h"> Asuncion </a>&nbsp;</td><td width="100">Sun, 03:08 PM<sup>dst</sup></td></tr><tr> 
<td>&nbsp;<a href="wtzresult.php?CiID=3963&forma=12h"> Athens </a>&nbsp;</td><td width="100">Sun, 08:08 PM</td></tr><tr bgcolor="#E0E0E0">

我想分析“雅典阳光，下午8点08分” 我测试并得到线使用函数strtok 解析一天，一个时钟，但返回分段错误感谢名单从长文本解析文件在C

while(fscanf(fp,"%s",word) != EOF){ 
    if (strstr(word,"Athens") != NULL) 
     strcpy(p,word); 
    }

来源

2017-02-20 mohd faiez

什么是'p'？和'字'？你目前的问题是什么？而且：如何检索'fd'？ – LPs

和文件中的文字 –

编辑您的问题：请勿使用评论发布其他内容。 – LPs

你可以使用strstr()获得的指针雅典的开头你的字符串，然后通过人物和循环修剪所有字符'<'和'>'，包括与自己和组成一个新的字符串。这将使您获得所需的输出。

来源

2017-02-20 10:22:00 Gnqz

相反的fscanf()，阅读文件的每一行与fgets(3)。在您的文件中查找"Athens Sun, 08:08 PM"，您可以使用strstr(3)匹配Athens"，然后您可以使用strtok(3)解析行的其余部分，使用html标记<和>作为分隔符。

然后，您可以使用strcat(3)和strcpy(3)将这些字符串添加到动态分配的char*指针。你需要确保这个指针可以同时按住"Athens"和"Sun, 08:08 PM"，加上一个空格和\0空终止符。您还可以将找到的字符串与strcmp(3)进行比较。

这里是你如何能做到这样一个例子：

#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <ctype.h> 

#define LINESIZE 1024 

int main(void) { 
    FILE *fp; 
    char *ret, *token, *result; 
    char line[LINESIZE] = {0}; 
    size_t numbytes, slen; 

    const char *city = "Athens"; 
    const char *datetime = "Sun, 08:08 PM"; 
    const char *delim = "<>\n"; 
    const char *space = " "; 

    fp = fopen("html.txt", "r"); 
    if (fp == NULL) { 
     fprintf(stderr, "Cannot open file\n"); 
     exit(EXIT_FAILURE); 
    } 

    numbytes = strlen(city) + strlen(datetime) + 1; 

    result = malloc(numbytes+1); 
    if (!result) { 
     fprintf(stderr, "Cannot allocate string\n"); 
     exit(EXIT_FAILURE); 
    } 

    while (fgets(line, LINESIZE, fp) != NULL) { 
     ret = strstr(line, city); 
     if (ret != NULL) { 
      token = strtok(ret, delim); 
      while (token != NULL) { 
       slen = strlen(token); 
       for (int i = (int)slen-1; i >= 0; i--) { 
        if (!isspace(token[i])) { 
         token[i+1] = '\0'; 
         break; 
        } 
       } 

       if (strcmp(token, city) == 0) { 
        strcpy(result, token); 
        strcat(result, space); 
       } 
       if (strcmp(token, datetime) == 0) { 
        strcat(result, token); 
       } 
       token = strtok(NULL, delim); 
      } 
     } 
    } 

    printf("Extracted string: %s\n", result); 

    free(result); 
    result = NULL; 

    return 0; 
}

来源

2017-02-20 11:33:24 RoadRunner

也许这会给你一些想法。您的分段错误可能来自于超出缓存空间以取消引用空指针（我猜测这是p变量）。当然，如果输入的格式偏离你的代码片段，代码将是无用的。在C中稍微提前一点之后，您可能还需要查看expat库。这需要将这些行转换成很少的XML文档。我确信存在用于C的HTML解析库，但我没有尝试过它们。

至少该程序的结果是：雅典太阳，下午8点08

#include <stdio.h> 
#include <ctype.h> 
#include <string.h> 

int main() 
{  
    char buf[1024]; 
    FILE *fp = fopen("the-data-file.txt","r"); 
    if(!fp){ /* error handling */ } 

    while(fgets(buf, sizeof(buf), fp)){ 
     char* city = strstr(buf,"Athens"); 
     char* td = city ? strstr(city, "<td") : NULL; 
     char* greater_than = td ? strstr(td, ">") : NULL; 
     char* less_than = greater_than ? strstr(greater_than, "<") : NULL; 
     if(less_than){ 
     while(*city && isalpha(*city)){ 
      printf("%c", *city++); 
     }  
     printf(" "); 
     while(++greater_than < less_than){ 
      printf("%c", *greater_than); 
     }  
     printf("\n"); 
     }  
    }  


    fclose(fp); 

}

来源

2017-02-20 15:53:27

哦，我忘了一个明显的评论。尝试使用-g编译并在调试器中运行以确定崩溃发生的位置。祝你好运。 –

从长文本解析文件在C

回答

相关问题