2015-07-12 41 views
1

我目前的Node.js代码从一个非常大的USPTO专利XML文件(大约100mb)中创建一个流,并在解析XML流时创建一个patentGrant对象。专利授权对象包括出版号,出版国,出版日期和专利种类。我正在尝试使用ElasticSearch创建一个包含所有patentGrant对象的数据库。我已成功添加代码以连接到本地ElasticSearch数据库,但我无法理解ElasticSearch-js API。我不知道应该如何将专利授权对象上传到数据库。从以下tutorial和以前的一个计算器问题我问here。好像我应该使用bulk api
继承人我ParseXml.js代码:将数据从Node.js流上传到ElasticSearch数据库

var CreateParsableXml = require('./CreateParsableXml.js'); 
var XmlParserStream = require('xml-stream'); 
// var Upload2ES = require('./Upload2ES.js'); 
var parseXml; 


var es = require('elasticsearch'); 
var client = new es.Client({ 
    host: 'localhost:9200' 
}); 


// create xml parser using xml-stream node.js module 
parseXml = new XmlParserStream(CreateParsableXml.concatXmlStream('ipg140107.xml')); 

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    console.log(patentGrant); 
}); 

parseXml.on('end', function() { 
    console.log('all done'); 
}); 

回答

1

大宗原料药,因为它在你链接的文档说,用于“指数”和“删除”操作。

使用create https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-create

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    client.create({ 
     index: 'myindex', 
     type: 'mytype', 
     body: patentGrant, 
    }, function() {} 
    ) 
    console.log(patentGrant); 
}); 

没有ID,它应该创建一个ID为每https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-index_.html#_automatic_id_generation

+0

这是伟大的,谢谢。后续问题,当我到localhost时:9200/mytype/myindex /它给了我以下错误消息:{“error”:“ElasticsearchIllegalArgumentException [名称没有特征[patentGrants]]”,“status”:400} ' –

+0

是索引和映射创建的? https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-create-index.html#mappings – jperelli

+0

不,我没有创建映射,是否没有默认映射会照顾到这一点我。此外,我一直在做更多的研究,并从该视频https://www.youtube.com/watch?v=7FLXjgB0PQI听说您可以通过使用批量api节省大量网络开销。对于我来说,使用create会更好,因为否则我必须将所有数据存储在一个javascript对象中,然后通过批量获取过程,这会占用很高的内存成本? –