2012-12-05 23 views
1

我正在使用poppler,并且我想使用poppler访问特定页码的主题或标题,所以请告诉我如何使用poppler来执行此操作。如何使用poppler访问pdfs中的主题名称?

+0

哪个前端(API),您使用的([油嘴滑舌(HTTP://人。 freedesktop.org/~ajohnson/docs/poppler-glib/),[qt](http://people.freedesktop.org/~aacid/docs/qt4/))?我认为你必须使用pdf的index/toc。请参阅[相关问题](http://stackoverflow.com/q/7131906/1381638)。 –

回答

0

使用glib API。不知道你想要哪个API。

我很确定没有与特定页面一起存储的主题/标题。 你必须走索引,如果有的话。

Walk the index带回溯。如果幸运的话,每个索引节点都包含一个PopplerActionGotoDest(检查类型!)。 您可以从PopplerAction对象中获取标题(gchar * title),并从包含的PopplerDestint page_num)中获取页码。 page_num应该是该部分的第一页。

假设您的PDF有一个包含PopplerActionGotoDest对象的索引。 然后你只需走它,检查page_num。 如果page_num> searching_num,则返回一个步骤。 当你在正确的父母身边时,走孩子。这应该会给你最好的搭配。 我只是做了一些代码吧:

gchar* getTitle(PopplerIndexIter *iter, int num, PopplerIndexIter *last,PopplerDocument *doc) 
{ 
    int cur_num = 0; 
    int next; 
    PopplerAction * action; 
    PopplerDest * dest; 
    gchar * title = NULL; 
    PopplerIndexIter * last_tmp; 

    do 
    { 
      action = poppler_index_iter_get_action(iter); 
      if (action->type != POPPLER_ACTION_GOTO_DEST) { 
       printf("No GOTO_DEST!\n"); 
       return NULL; 
      } 

      //get page number of current node 
      if (action->goto_dest.dest->type == POPPLER_DEST_NAMED) { 
       dest = poppler_document_find_dest (doc, action->goto_dest.dest->named_dest); 
       cur_num = dest->page_num; 
       poppler_dest_free(dest); 
      } else { 
       cur_num = action->goto_dest.dest->page_num; 
      } 
      //printf("cur_num: %d, %d\n",cur_num,num); 

      //free action, as we don't need it anymore 
      poppler_action_free(action); 

      //are there nodes following this one? 
      last_tmp = poppler_index_iter_copy(iter); 
      next = poppler_index_iter_next (iter); 

      //descend 
      if (!next || cur_num > num) { 
       if ((!next && cur_num < num) || cur_num == num) { 
        //descend current node 
        if (last) { 
         poppler_index_iter_free(last); 
        } 
        last = last_tmp; 
       } 
       //descend last node (backtracking) 
       if (last) { 
        /* Get the the action and do something with it */ 
        PopplerIndexIter *child = poppler_index_iter_get_child (last); 
        gchar * tmp = NULL; 
        if (child) { 
         tmp = getTitle(child,num,last,doc); 
         poppler_index_iter_free (child); 
        } else { 
         action = poppler_index_iter_get_action(last); 
         if (action->type != POPPLER_ACTION_GOTO_DEST) { 
          tmp = NULL; 
         } else { 
          tmp = g_strdup (action->any.title); 
         } 
         poppler_action_free(action); 
         poppler_index_iter_free (last); 
        } 

        return tmp; 
       } else { 
        return NULL; 
       } 
      } 

      if (cur_num > num || (next && cur_num != 0)) { 
       // free last index_iter 
       if (last) { 
        poppler_index_iter_free(last); 
       } 
       last = last_tmp; 
      } 
    } 
    while (next); 

    return NULL; 
} 

的getTitle得到由名为:poppler的的

for (i = 0; i < num_pages; i++) { 
      iter = poppler_index_iter_new (document); 
      title = getTitle(iter,i,NULL,document); 
      poppler_index_iter_free (iter); 

      if (title) { 
       printf("title of %d: %s\n",i, title); 
       g_free(title); 
      } else { 
       printf("%d: no title\n",i); 
      } 
    } 
相关问题