2017-07-17 119 views
1

其中一个名为“resources”的字段具有以下2个内部文档。从字段数组中提取文本

{ 
    "type": "AWS::S3::Object", 
    "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
}, 
{ 
    "accountId": "934331768510612", 
    "type": "AWS::S3::Bucket", 
    "ARN": "arn:aws:s3:::sms_vild" 
} 

我需要拆分ARN字段并获取它的最后部分。即“reports_201706.schema”,优选使用脚本字段。


我曾尝试:

1)我检查的Fileds名单,发现只有2项resources.accountId和resources.type

2)我试图与日期时间字段,它在脚本提交选项(表达式)中正确工作。

doc['eventTime'].value 

3)但是,对于其他文本字段,例如,

doc['eventType'].value 

收到此错误:

"caused_by":{"type":"script_exception","reason":"link error","script_stack":["doc['eventType'].value","^---- HERE"],"script":"doc['eventType'].value","lang":"expression","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [eventType] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."}}},"status":500} 

这意味着我需要改变的映射。有没有其他方法可以从对象中的嵌套数组中提取文本?


更新:

请点击这里查看样品kibana ...

https://search-accountact-phhofxr23bjev4uscghwda4y7m.us-east-1.es.amazonaws.com/_plugin/kibana/

搜索 “ebs_attach.png”,然后检查资源领域。你会看到2个嵌套数组像这样...

{ 
    "type": "AWS::S3::Object", 
    "ARN": "arn:aws:s3:::datameetgeo/ebs_attach.png" 
}, 
{ 
    "accountId": "513469704633", 
    "type": "AWS::S3::Bucket", 
    "ARN": "arn:aws:s3:::datameetgeo" 
} 

我需要拆分ARN现场并提取最后一部分又是“ebs_attach.png”

如果我能有的,如何将其显示为脚本的字段,那么我可以在发现选项卡上并排查看存储桶名称和文件名。


更新2

换句话说,我试图提取该图像中显示为发现标签上的一个新的领域的文本。

enter image description here

回答

2

尽管您可以使用脚本编写,但我强烈建议您在索引时提取这些信息。我在这里提供了两个例子,这些例子远不是故障安全的(你需要测试不同的路径或者根本没有这个字段),但它应该提供一个基础,以开始

PUT foo/bar/1 
{ 
    "resources": [ 
    { 
     "type": "AWS::S3::Object", 
     "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
    }, 
    { 
     "accountId": "934331768510612", 
     "type": "AWS::S3::Bucket", 
     "ARN": "arn:aws:s3:::sms_vild" 
    } 
    ] 
} 

# this is slow!!! 
GET foo/_search 
{ 
    "script_fields": { 
    "document": { 
     "script": { 
     "inline": "return params._source.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')" 
     } 
    } 
    } 
} 

# Do this on index time, by adding a pipeline 
PUT _ingest/pipeline/my-pipeline-id 
{ 
    "description" : "describe pipeline", 
    "processors" : [ 
    { 
     "script" : { 
     "inline": "ctx.filename = ctx.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')" 
     } 
    } 
    ] 
} 

# Store the document, specify the pipeline 
PUT foo/bar/1?pipeline=my-pipeline-id 
{ 
    "resources": [ 
    { 
     "type": "AWS::S3::Object", 
     "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
    }, 
    { 
     "accountId": "934331768510612", 
     "type": "AWS::S3::Bucket", 
     "ARN": "arn:aws:s3:::sms_vild" 
    } 
    ] 
} 

# lets check the filename field of the indexed document by getting it 
GET foo/bar/1 

# We can even search for this file now 
GET foo/_search 
{ 
    "query": { 
    "match": { 
     "filename": "reports_201706.schema" 
    } 
    } 
} 
0

注:被认为是 “资源” 是一种阵列

NSArray *array_ARN_Values = [resources valueForKey:@"ARN"]; 

的希望它会为你工作!

+0

这是行不通的。请参阅最新的问题。 – shantanuo

+0

我如何知道资源是否是一种数组?我没有在字段列表中看到“资源”。但是,来自资源的类型,ARN和accountid参数被索引。 – shantanuo