1
我在写一个脚本,它将apache日志文件的请求参数解析到熊猫表中。如何在熊猫中将多行(多索引?)连接成一行或如何从apache日志(字符串)中提取参数?
一个例子请求是这样的:
GET /v1/board?id=8504178&limit=1&to=8504177 HTTP/1.1
GET /v1/connections?from=850417&to=8504177 HTTP/1.1
GET /v1/location?query=850417
有很多的参数和没有固定的顺序。所以我认为pandas方法extract()将不起作用。 这就是为什么我试着用extractall()来做到这一点。我的第一个正则表达式和版本提取它是这样的:
req_patt = ("(?P<request>GET) \/v1\/(?P<resource>connections|stationboard|locations)|"
"from=(?P<from>.*?)&|"
"to=(?P<to>\d*|\w*)(?P<to_del>&|\s)"
)
df_temp = df['fullrequest'].str.extractall(req_patt)
所以,我得到这个数据帧回:
actual output:
index requests resources from to
(0, 0) GET connections nan nan
(0, 1) nan nan 8504178 nan
(0, 2) nan nan nan 8504177
(1, 0) GET stationboard nan nan
(1, 1) nan nan nan 8504177
但最后我想有这样的事情:
expected output:
index requests resources from to
0 GET connections 8504178 8504177
1 GET stationboard nan 8504177
所以我在最后一个问题: 我如何加入这些单行一个行?
'(P GET(= \ s)?) | \/v1 \ /(?P [^?\ s] *)| from =(?P [^&\ s] *)| to =(?P [^&s] * )'? –
ctwheels