2012-08-17 52 views
2

我想索引使用Tire gem作为ElasticSearch客户端的pdf附件。在我的地图,我排除的附件名称字段从_source,使附件不存储在索引和没有返回的搜索结果未映射的字段包含在ElasticSearch返回的搜索结果中

mapping :_source => { :excludes => ['attachment_original'] } do 
    indexes :id, :type => 'integer' 
    indexes :folder_id, :type => 'integer' 
    indexes :attachment_file_name 
    indexes :attachment_updated_at, :type => 'date' 
    indexes :attachment_original, :type => 'attachment' 
end 

我仍然可以看到包括在搜索附件内容结果,当我运行下面的curl命令:

curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{ 
    "query": { 
    "query_string": { 
     "query": "rspec" 
    } 
    } 
}' 

我已经发布我的问题在这个thread

但是我刚才注意到,不仅是附件包括在搜索结果中,但所有其他领域,包括那些没有被映射,也包括在内,你可以在这里看到:

{ 
    "took": 20, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.025427073, 
    "hits": [ 
     { 
     "_index": "user_files", 
     "_type": "user_file", 
     "_id": "5", 
     "_score": 0.025427073, 
     "_source": { 
      "user_file": { 
      "id": 5, 
      "folder_id": 1, 
      "updated_at": "2012-08-16T11:32:41Z", 
      "attachment_file_size": 179895, 
      "attachment_updated_at": "2012-08-16T11:32:41Z", 
      "attachment_file_name": "hw4.pdf", 
      "attachment_content_type": "application/pdf", 
      "created_at": "2012-08-16T11:32:41Z", 
      "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" 
      } 
     } 
     } 
    ] 
    } 
} 

attachment_file_sizeattachment_content_type在映射没有定义,但在返回搜索结果:

{ 
    "id": 5, 
    "folder_id": 1, 
    "updated_at": "2012-08-16T11:32:41Z", 
    "attachment_file_size": 179895, <--------------------- 
    "attachment_updated_at": "2012-08-16T11:32:41Z", 
    "attachment_file_name": "hw4.pdf", <------------------ 
    "attachment_content_type": "application/pdf", 
    "created_at": "2012-08-16T11:32:41Z", 
    "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" 
} 

这里是我的全面实施:

include Tire::Model::Search 
    include Tire::Model::Callbacks 

    def self.search(folder, params) 
    tire.search() do 
     query { string params[:query], default_operator: "AND"} if params[:query].present? 
     #filter :term, folder_id: folder.id 
     #highlight :attachment_original, :options => {:tag => "<em>"} 
     raise to_curl 
    end 
    end 

    mapping :_source => { :excludes => ['attachment_original'] } do 
    indexes :id, :type => 'integer' 
    indexes :folder_id, :type => 'integer' 
    indexes :attachment_file_name 
    indexes :attachment_updated_at, :type => 'date' 
    indexes :attachment_original, :type => 'attachment' 
    end 

    def to_indexed_json 
    to_json(:methods => [:attachment_original]) 
    end 

    def attachment_original 
    if attachment_file_name.present? 
     path_to_original = attachment.path 
     Base64.encode64(open(path_to_original) { |f| f.read }) 
    end  
    end 

有人能帮助我弄清楚为什么所有的字段重新包含在_source

编辑:这是运行localhost:9200/user_files/_mapping

{ 
    "user_files": { 
    "user_file": { 
     "_source": { 
     "excludes": [ 
      "attachment_original" 
     ] 
     }, 
     "properties": { 
     "attachment_content_type": { 
      "type": "string" 
     }, 
     "attachment_file_name": { 
      "type": "string" 
     }, 
     "attachment_file_size": { 
      "type": "long" 
     }, 
     "attachment_original": { 
      "type": "attachment", 
      "path": "full", 
      "fields": { 
      "attachment_original": { 
       "type": "string" 
      }, 
      "author": { 
       "type": "string" 
      }, 
      "title": { 
       "type": "string" 
      }, 
      "name": { 
       "type": "string" 
      }, 
      "date": { 
       "type": "date", 
       "format": "dateOptionalTime" 
      }, 
      "keywords": { 
       "type": "string" 
      }, 
      "content_type": { 
       "type": "string" 
      } 
      } 
     }, 
     "attachment_updated_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     }, 
     "created_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     }, 
     "folder_id": { 
      "type": "integer" 
     }, 
     "id": { 
      "type": "integer" 
     }, 
     "updated_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     } 
     } 
    } 
    } 
} 

的输出正如你所看到的,由于某种原因,所有的领域都包含在映射!

+0

在这个线程http://stackoverflow.com/questions/11251851/how-do-you-index-attachment-in-elasticsearch-with-tire?rq=1它看起来像未定义的字段也包括在映射。 – 2012-08-17 08:50:09

回答

1

在你的to_indexed_json中,你包含了attachment_original方法,所以它被发送到elasticsearch。这也是所有其他属性都包含在映射中的原因,因此也是源代码。

有关该主题的更多信息,请参阅ElasticSearch & Tire: Using Mapping and to_indexed_json问题。

看来Tire确实会将正确的映射JSON发送到elasticsearch--我的建议是使用Tire.configure { logger STDERR, level: "debug" }来检查发生了什么事情,并通过trz来查明原始级别的问题。

+0

我误解了to_indexed_json的工作原理。再次感谢链接,它有很大的帮助。 – 2012-08-18 03:00:40

相关问题