我有2个数据源。一个包含api调用列表,另一个包含所有相关的认证事件。每个Api调用可以有多个Auth事件,我想查找以下验证事件:
a)包含与Api调用相同的“标识符”
b)Api调用后一秒内发生
c)在上述过滤之后最接近Api调用。Pig Latin(在foreach循环中过滤第2个数据源)
我曾在一个foreach循环通过每个ApiCall事件计划循环再利用的authevents过滤语句来找到正确的 - 但是,它不会出现,这是可能的(USING Filter in a Nested FOREACH in PIG)
会有人能够建议其他方式来实现这一点。如果有帮助,这里的猪脚本我试着使用:
apiRequests = LOAD '/Documents/ApiRequests.txt' AS (api_fileName:chararray, api_requestTime:long, api_timeFromLog:chararray, api_call:chararray, api_leadString:chararray, api_xmlPayload:chararray, api_sourceIp:chararray, api_username:chararray, api_identifier:chararray);
authEvents = LOAD '/Documents/AuthEvents.txt' AS (auth_fileName:chararray, auth_requestTime:long, auth_timeFromLog:chararray, auth_call:chararray, auth_leadString:chararray, auth_xmlPayload:chararray, auth_sourceIp:chararray, auth_username:chararray, auth_identifier:chararray);
specificApiCall = FILTER apiRequests BY api_call == 'CSGetUser'; -- Get all events for this specific call
match = foreach specificApiCall { -- Now try to get the closest mathcing auth event
filtered1 = filter authEvents by auth_identifier == api_identifier; -- Only use auth events that have the same identifier (this will return several)
filtered2 = filter filtered1 by (auth_requestTime-api_requestTime)<1000; -- Further refine by usings auth events within a second on the api call's tiime
sorted = order filtered2 by auth_requestTime; -- Get the auth event that's closest to the api call
limited = limit sorted 1;
generate limited;
};
dump match;
谢谢小熊,我用协同组和它的工作一种享受。你是最好的! – Hinchy