2014-02-26 38 views
0

此问题部分与此question有关。填写滚动相关矩阵的缺失值

我的数据文件可以找到here。我使用2008年1月1日至2013年12月31日的样本期。数据文件没有缺失值。

以下代码使用前一年价值的滚动窗口在2008年1月1日至2013年12月31日的每一天生成滚动相关矩阵。例如,2008年1月1日的AUTBEL之间的相关性使用2007年1月1日至2008年1月1日的一系列值计算,并且对于所有其他配对也是如此。

data work.rolling; 
set mm.rolling; 
run; 

%macro rollingCorrelations(inputDataset=, refDate=); 
/*first get a list of unique dates on or after the reference date*/ 
proc freq data = &inputDataset. noprint; 
where date >="&refDate."d; 
table date/out = dates(keep = date); 
run; 


/*for each date calculate what the window range is, here using a year's length*/ 
data dateRanges(drop = date); 
set dates end = endOfFile 
       nobs= numDates; 
format toDate fromDate date9.; 

toDate=date; 
fromDate = intnx('year', toDate, -1, 's'); 

call symputx(compress("toDate"!!_n_), put(toDate,date9.)); 
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.)); 

/*find how many times(numberOfWindows) we need to iterate through*/ 
if endOfFile then do; 
call symputx("numberOfWindows", numDates); 
end; 

run; 
%do i = 1 %to &numberOfWindows.; 
/*create a temporary view which has the filtered data that is passed to PROC CORR*/ 
data windowedDataview/view = windowedDataview; 
set &inputDataset.; 
where date between "&&fromDate&i."d and "&&toDate&i."d; 
drop date; 
run; 
    /*the output dataset from each PROC CORR run will be 
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/ 
proc corr data = windowedDataview 
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR')) 

     noprint; 
run; 

%end; 

/*append all datasets into a single table*/ 
data all_correlations; 
format from to date9.; 
set correlations_: 
    indsname = datasetname 
; 
from = input(substr(datasetname,19,9),date9.); 
to = input(substr(datasetname,29,9), date9.); 
run; 


%mend rollingCorrelations; 
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008) 

输出的摘录可以找到here

可以看出,第2行到第53行显示了2008年4月1日的相关矩阵。然而,2009年4月1日的相关矩阵出现了问题:ALPHA有相关系数的缺失值,它的对。这是因为如果查看数据文件,则从2008年4月1日到2009年4月1日的ALPHA的值都为零,因此导致除以零。这种情况也会发生在其他一些数据值上,例如,HSBC也具有从08年4月1日到2009年4月1日0的所有值。

要解决此问题,我想知道上述代码如何修改即在发生这种情况的情况下(即在2个特定日期之间所有值都为0),则使用整个采样周期简单计算两对数据值之间的相关性。例如,上缺少09年4月1日ALPHAAUT之间的相关性,因此该相关性应该使用的值从1 2008 JAN到2013年12月31日,而不是使用的值从08年4月1日至09年4月1日

+0

您是否拥有ETS授权? – Joe

+0

@Joe我不确定其实,我该如何检查? – user3184733

+0

@ user3184733要检查您已授权的产品,您可以运行以下过程来检查许可证文件并将产品列表输出到日志中。然后简单地做一个'CTRL + F'搜索'SAS/ETS'。 'PROC SETINIT;运行;' – 2014-02-27 10:23:31

回答

1

计算一旦运行上面的宏和已经拿到all_correlations数据集,你需要使用的所有数据即运行另一个PROC CORR这个时候,

/*first filter the data to be between "01JAN2008"d and "31DEC2013"d*/ 
data work.all_data_01JAN2008_31DEC2013; 
set mm.rolling; 
where date between "01JAN2008"d and "31DEC2013"d; 
drop date ; 
run; 

接着上面的数据集传递给PROC CORR

proc corr data = work.all_data_01JAN2008_31DEC2013 
outp = correlations_01JAN2008_31DEC2013 
(where=(_type_ = 'CORR')) 

     noprint; 
run; 
data correlations_01JAN2008_31DEC2013; 
length id 8; 
set correlations_01JAN2008_31DEC2013; 
/*add a column identifier to make sure the order of the correlation matrix is preserved when joined with other tables*/ 
id = _n_; 
run; 

您将得到一个由_name_列唯一的数据集。 然后,您将不得不加入correlations_01JAN2008_31DEC2013all_correlations,以便如果在all_correlations中缺少一个值,则会在其位置插入对应的值correlations_01JAN2008_31DEC2013。为此,我们可以使用PROC SQL & COALESCE函数。

PROC SQL; 
CREATE TABLE MISSING_VALUES_IMPUTED AS 
SELECT 
A.FROM 
,A.TO 
,b.id 
,a._name_ 
,coalesce(a.AUT,b.AUT) as AUT 
,coalesce(a.BEL,b.BEL) as BEL 
,coalesce(a.DEN,b.DEN) as DEN 
,coalesce(a.FRA,b.FRA) as FRA 
,coalesce(a.GER,b.GER) as GER 
,coalesce(a.GRE,b.GRE) as GRE 
,coalesce(a.IRE,b.IRE) as IRE 
,coalesce(a.ITA,b.ITA) as ITA 
,coalesce(a.NOR,b.NOR) as NOR 
,coalesce(a.POR,b.POR) as POR 
,coalesce(a.SPA,b.SPA) as SPA 
,coalesce(a.SWE,b.SWE) as SWE 
,coalesce(a.NL,b.NL) as NL 
,coalesce(a.ERS,b.ERS) as ERS 
,coalesce(a.RZB,b.RZB) as RZB 
,coalesce(a.DEX,b.DEX) as DEX 
,coalesce(a.KBD,b.KBD) as KBD 
,coalesce(a.DAB,b.DAB) as DAB 
,coalesce(a.BNP,b.BNP) as BNP 
,coalesce(a.CRDA,b.CRDA) as CRDA 
,coalesce(a.KN,b.KN) as KN 
,coalesce(a.SGE,b.SGE) as SGE 
,coalesce(a.CBK,b.CBK) as CBK 
,coalesce(a.DBK,b.DBK) as DBK 
,coalesce(a.IKB,b.IKB) as IKB 
,coalesce(a.ALPHA,b.ALPHA) as ALPHA 
,coalesce(a.ALBK,b.ALBK) as ALBK 
,coalesce(a.IPM,b.IPM) as IPM 
,coalesce(a.BKIR,b.BKIR) as BKIR 
,coalesce(a.BMPS,b.BMPS) as BMPS 
,coalesce(a.PMI,b.PMI) as PMI 
,coalesce(a.PLO,b.PLO) as PLO 
,coalesce(a.BINS,b.BINS) as BINS 
,coalesce(a.MB,b.MB) as MB 
,coalesce(a.UC,b.UC) as UC 
,coalesce(a.BCP,b.BCP) as BCP 
,coalesce(a.BES,b.BES) as BES 
,coalesce(a.BBV,b.BBV) as BBV 
,coalesce(a.SCHSPS,b.SCHSPS) as SCHSPS 
,coalesce(a.NDA,b.NDA) as NDA 
,coalesce(a.SEA,b.SEA) as SEA 
,coalesce(a.SVK,b.SVK) as SVK 
,coalesce(a.SPAR,b.SPAR) as SPAR 
,coalesce(a.CSGN,b.CSGN) as CSGN 
,coalesce(a.UBSN,b.UBSN) as UBSN 
,coalesce(a.ING,b.ING) as ING 
,coalesce(a.SNS,b.SNS) as SNS 
,coalesce(a.BARC,b.BARC) as BARC 
,coalesce(a.HBOS,b.HBOS) as HBOS 
,coalesce(a.HSBC,b.HSBC) as HSBC 
,coalesce(a.LLOY,b.LLOY) as LLOY 
,coalesce(a.STANBS,b.STANBS) as STANBS 
from all_correlations as a 
inner join correlations_01JAN2008_31DEC2013 as b 
on a._name_ = b._name_ 
order by 
A.FROM 
,A.TO 
,b.id 
; 
quit; 
/*verify that no missing values are left. NMISS column should be 0 from all variables*/ 
proc means data = MISSING_VALUES_IMPUTED n nmiss; 
run; 
+0

谢谢你的巨大帮助。简洁,易于遵循的答案! – user3184733