如何使用poppler访问pdfs中的主题名称？

我正在使用poppler，并且我想使用poppler访问特定页码的主题或标题，所以请告诉我如何使用poppler来执行此操作。如何使用poppler访问pdfs中的主题名称？

2012-12-05 nirmitkansal

哪个前端（API），您使用的（[油嘴滑舌（HTTP：//人。 freedesktop.org/~ajohnson/docs/poppler-glib/），[qt]（http://people.freedesktop.org/~aacid/docs/qt4/））？我认为你必须使用pdf的index/toc。请参阅[相关问题]（http://stackoverflow.com/q/7131906/1381638）。 –

使用glib API。不知道你想要哪个API。

我很确定没有与特定页面一起存储的主题/标题。你必须走索引，如果有的话。

Walk the index带回溯。如果幸运的话，每个索引节点都包含一个PopplerActionGotoDest（检查类型！）。您可以从PopplerAction对象中获取标题（gchar * title），并从包含的PopplerDest（int page_num）中获取页码。 page_num应该是该部分的第一页。

假设您的PDF有一个包含PopplerActionGotoDest对象的索引。然后你只需走它，检查page_num。如果page_num> searching_num，则返回一个步骤。当你在正确的父母身边时，走孩子。这应该会给你最好的搭配。我只是做了一些代码吧：

gchar* getTitle(PopplerIndexIter *iter, int num, PopplerIndexIter *last,PopplerDocument *doc) 
{ 
    int cur_num = 0; 
    int next; 
    PopplerAction * action; 
    PopplerDest * dest; 
    gchar * title = NULL; 
    PopplerIndexIter * last_tmp; 

    do 
    { 
      action = poppler_index_iter_get_action(iter); 
      if (action->type != POPPLER_ACTION_GOTO_DEST) { 
       printf("No GOTO_DEST!\n"); 
       return NULL; 
      } 

      //get page number of current node 
      if (action->goto_dest.dest->type == POPPLER_DEST_NAMED) { 
       dest = poppler_document_find_dest (doc, action->goto_dest.dest->named_dest); 
       cur_num = dest->page_num; 
       poppler_dest_free(dest); 
      } else { 
       cur_num = action->goto_dest.dest->page_num; 
      } 
      //printf("cur_num: %d, %d\n",cur_num,num); 

      //free action, as we don't need it anymore 
      poppler_action_free(action); 

      //are there nodes following this one? 
      last_tmp = poppler_index_iter_copy(iter); 
      next = poppler_index_iter_next (iter); 

      //descend 
      if (!next || cur_num > num) { 
       if ((!next && cur_num < num) || cur_num == num) { 
        //descend current node 
        if (last) { 
         poppler_index_iter_free(last); 
        } 
        last = last_tmp; 
       } 
       //descend last node (backtracking) 
       if (last) { 
        /* Get the the action and do something with it */ 
        PopplerIndexIter *child = poppler_index_iter_get_child (last); 
        gchar * tmp = NULL; 
        if (child) { 
         tmp = getTitle(child,num,last,doc); 
         poppler_index_iter_free (child); 
        } else { 
         action = poppler_index_iter_get_action(last); 
         if (action->type != POPPLER_ACTION_GOTO_DEST) { 
          tmp = NULL; 
         } else { 
          tmp = g_strdup (action->any.title); 
         } 
         poppler_action_free(action); 
         poppler_index_iter_free (last); 
        } 

        return tmp; 
       } else { 
        return NULL; 
       } 
      } 

      if (cur_num > num || (next && cur_num != 0)) { 
       // free last index_iter 
       if (last) { 
        poppler_index_iter_free(last); 
       } 
       last = last_tmp; 
      } 
    } 
    while (next); 

    return NULL; 
}

的getTitle得到由名为：poppler的的

for (i = 0; i < num_pages; i++) { 
      iter = poppler_index_iter_new (document); 
      title = getTitle(iter,i,NULL,document); 
      poppler_index_iter_free (iter); 

      if (title) { 
       printf("title of %d: %s\n",i, title); 
       g_free(title); 
      } else { 
       printf("%d: no title\n",i); 
      } 
    }

来源

2012-12-05 14:19:41

如何使用poppler访问pdfs中的主题名称？

回答

相关问题