时间序列分析 - 不均匀间隔的措施 - 熊猫+ statsmodels

我有两个numpy数组light_points和time_points，并希望对这些数据使用一些时间序列分析方法。时间序列分析 - 不均匀间隔的措施 - 熊猫+ statsmodels

然后我尝试这样做：

import statsmodels.api as sm 
import pandas as pd 
tdf = pd.DataFrame({'time':time_points[:]}) 
rdf = pd.DataFrame({'light':light_points[:]}) 
rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light)) 
#rdf.index = pd.DatetimeIndex(tdf['time'])

这工作，但没有做正确的事情。事实上，测量是不是均匀的时间间隔，如果我只是宣布time_points大熊猫据帧作为我的帧的索引，我得到一个错误：

rdf.index = pd.DatetimeIndex(tdf['time']) 

decomp = sm.tsa.seasonal_decompose(rdf) 

elif freq is None: 
raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index") 

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

我不知道该如何纠正。此外，似乎熊猫'TimeSeries已弃用。

我尝试这样做：

rdf = pd.Series({'light':light_points[:]}) 
rdf.index = pd.DatetimeIndex(tdf['time'])

但它给了我一个长度不匹配：

ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements

不过，我不明白它从何而来，为RDF [ '光']和 TDF [“时间”]是相同的长度...

最后，我想用我的定义RDF作为熊猫系列：

rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))

而且我得到这个：

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

于是，我试着改为由

pd.TimeSeries(time_points[:])

替代指标，这让我对seasonal_decompose方法行错误：

AttributeError: 'Float64Index' object has no attribute 'inferred_freq'

如何处理不均匀间隔的数据？我正在考虑通过在现有值之间添加许多未知值并使用插值来“评估”这些点来创建一个大致均匀间隔的时间数组，但我认为可能有一个更清晰和更简单的解决方案。

来源

2015-12-28 Robin

你会增加的变化得到一个很好的答案，如果你发布了一个[Minimal，Complete，and Verifiable example]（http://stackoverflow.com/help/mcve） –

seasonal_decompose()要求是无论是作为的DateTimeIndex元信息部分提供了一种freq，可以通过pandas.Index.inferred_freq或者由用户作为integer给出每个循环周期的数目来推断。例如，12月（从docstring为seasonal_mean）：

def seasonal_decompose(x, model="additive", filt=None, freq=None): 
    """ 
    Parameters 
    ---------- 
    x : array-like 
     Time series 
    model : str {"additive", "multiplicative"} 
     Type of seasonal component. Abbreviations are accepted. 
    filt : array-like 
     The filter coefficients for filtering out the seasonal component. 
     The default is a symmetric moving average. 
    freq : int, optional 
     Frequency of the series. Must be used if x is not a pandas 
     object with a timeseries index.

为了说明 - 使用随机样本数据：

length = 400 
x = np.sin(np.arange(length)) * 10 + np.random.randn(length) 
df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value']) 

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28 
Freq: W-SUN 

decomp = sm.tsa.seasonal_decompose(df) 
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1) 
data.columns = ['series', 'trend', 'seasonal', 'resid'] 

Data columns (total 4 columns): 
series  400 non-null float64 
trend  348 non-null float64 
seasonal 400 non-null float64 
resid  348 non-null float64 
dtypes: float64(4) 
memory usage: 15.6 KB

到目前为止，那么好 - 现在随机从DateTimeIndex落下元件创建不均匀的空间数据：

df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))] 

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21 
Data columns (total 1 columns): 
value 222 non-null float64 
dtypes: float64(1) 
memory usage: 3.5 KB 

df.index.freq 

None 

df.index.inferred_freq 

None

运行这个数据seasonal_decomp“作品”：

decomp = sm.tsa.seasonal_decompose(df, freq=52) 

data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1) 
data.columns = ['series', 'trend', 'seasonal', 'resid'] 

DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07 
Data columns (total 4 columns): 
series  224 non-null float64 
trend  172 non-null float64 
seasonal 224 non-null float64 
resid  172 non-null float64 
dtypes: float64(4) 
memory usage: 8.8 KB

的问题是 - 是多么有用的结果。即使没有数据是复杂的季节性模式（见例如在release notes使用.interpolate()推理差距，statsmodels资格此过程如下：

Notes 
----- 
This is a naive decomposition. More sophisticated methods should 
be preferred. 

The additive model is Y[t] = T[t] + S[t] + e[t] 

The multiplicative model is Y[t] = T[t] * S[t] * e[t] 

The seasonal component is first removed by applying a convolution 
filter to the data. The average of this smoothed series for each 
period is the returned seasonal component.

来源

2015-12-28 16:49:56 Stefan

你为什么用'freq = 52'，为什么52不是另一个号码？ – Rocketq

这是一段时间但是我相信因为我的例子使用每周随机数据 - 请参阅上文。 – Stefan

时间序列分析 - 不均匀间隔的措施 - 熊猫+ statsmodels

回答

相关问题