我正在尝试找到具有标准差的秒外离群点。我有两个数据框如下。我试图找到的异常值与周平均值相差1.5个标准差?当前代码低于数据框。找到数据的离群点
DF1:
name dateTime Seconds
joe 2015-02-04 12:12:12 54321.0202
john 2015-01-02 13:13:13 12345.0101
joe 2015-02-04 12:12:12 54321.0202
john 2015-01-02 13:13:13 12345.0101
joe 2015-02-04 12:12:12 54321.0202
john 2015-01-02 13:13:13 12345.0101
joe 2015-02-04 12:12:12 54321.0202
john 2015-01-02 13:13:13 12345.0101
joe 2015-02-04 12:12:12 54321.0202
john 2015-01-02 13:13:13 12345.0101
joe 2015-02-04 12:12:12 54321.0202
joe 2015-01-02 13:13:13 12345.0101
电流输出:DF2
name day standardDev mean count
Joe mon 22326.502700 40900.730647 1886
tue 9687.486726 51166.213836 159
john mon 10072.707891 41380.035108 883
tue 5499.475345 26985.938776 196
预期输出:
DF2
name day standardDev mean count events
Joe mon 22326.502700 40900.730647 1886 [2015-02-04 12:12:12, 2015-02-04 12:12:13]
tue 9687.486726 51166.213836 159 [2015-02-04 12:12:12, 2015-02-04 12:12:14]
john mon 10072.707891 41380.035108 883 [2015-01-02 13:13:13, 2015-01-02 13:13:15]
tue 5499.475345 26985.938776 196 [2015-01-02 13:13:13, 2015-01-02 13:13:18]
CODE:
allFiles = glob.glob(folderPath + "/*.csv")
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_, index_col=None, names=['EventTime', "IpAddress", "Hostname", "TargetUserName", "AuthenticationPackageName", "TargetDomainName", "EventReceivedTime"])
df = df.ix[1:]
list_.append(df)
df = pd.concat(list_)
df['DateTime'] = pd.to_datetime(df['EventTime'])
df['day_of_week'] = df.DateTime.dt.strftime('%a')
df['seconds'] = pd.to_timedelta(df.DateTime.dt.time.astype(str)).dt.seconds
print(df.groupby((['TargetUserName', 'day_of_week'])).agg({'seconds': {'mean': lambda x: (x.mean()), 'std': lambda x: (np.std(x)), 'count': 'count'}}))
也许'DF1 [df1.groupby(pd.DatetimeIndex(df.dateTime).dayofweek)[ '秒']应用(拉姆达×:X>(1.5 * x.std()+ x.mean ()))]'? – Abdou
你究竟意味着什么“我不确定如何达到预期的产出”。 – Amjad
我想弄清楚如何添加事件列并追踪1.5个标准偏差距离均值上下的所有事件?理想情况下,我想添加具有完整数据的任何行,这是在事件列的时间段之外作为事件列表。 – johnnyb