2015-10-07 72 views
1

是否有可能以矢量化的方式计算DataFrame中每个列的相关性?这适用于滚动关联和EWM关联,但不适用于香草之一。系列与Pandas中DataFrame每一列的相关性,矢量化

例如:

In [3]: series = pd.Series(pd.np.random.rand(12)) 

In [4]: frame = pd.DataFrame(pd.np.random.rand(12,4)) 

In [7]: pd.ewmcorr(series, frame, span=3) 
Out[7]: 
      0   1   2   3 
0  NaN  NaN  NaN  NaN 
1 -1.000000 -1.000000 1.000000 1.000000 
2 0.644915 -0.980088 -0.802944 -0.922638 
3 0.499564 -0.919574 -0.240631 -0.256109 
4 -0.172139 -0.913296 0.482402 -0.282733 
5 -0.394725 -0.693024 0.168029 0.177241 
6 -0.219131 -0.475347 0.192552 0.149787 
7 -0.461821 0.353778 0.538289 -0.005628 
8 0.573406 0.681704 -0.491689 0.194916 
9 0.655414 -0.079153 -0.464814 -0.331571 
10 0.735604 -0.389858 -0.647369 0.220238 
11 0.205766 -0.249702 -0.463639 -0.106032 

In [8]: pd.rolling_corr(series, frame, window=3) 
Out[8]: 
      0   1   2   3 
0  NaN  NaN  NaN  NaN 
1  NaN  NaN  NaN  NaN 
2 0.496697 -0.957551 -0.-0.849874 
3 0.886848 -0.937174 -0.479519 -0.505008 
4 -0.180454 -0.950213 0.331308 0.987414 
5 -0.998852 -0.770988 0.582625 0.821079 
6 -0.849263 -0.142453 -0.690959 0.805143 
7 -0.617343 0.768797 0.299155 0.415997 
8 0.930545 0.883782 -0.287360 -0.073551 
9 0.917790 -0.171220 -0.993951 -0.207630 
10 0.916901 -0.246603 -0.990313 0.862856 
11 0.426314 -0.876191 -0.643768 -0.225983 

In [10]: series.corr(frame) 
--------------------------------------------------------------------------- 
AttributeError       Traceback (most recent call last) 
<ipython-input-10-599dbd7f0707> in <module>() 
----> 1 series.corr(frame) 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/series.py in corr(self, other, method, min_periods) 
    1280   correlation : float 
    1281   """ 
-> 1282   this, other = self.align(other, join='inner', copy=False) 
    1283   if len(this) == 0: 
    1284    return np.nan 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis) 
    3372          copy=copy, fill_value=fill_value, 
    3373          method=method, limit=limit, 
-> 3374          fill_axis=fill_axis) 
    3375   elif isinstance(other, Series): 
    3376    return self._align_series(other, join=join, axis=axis, level=level, 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis) 
    3396 
    3397   if axis is None or axis == 1: 
-> 3398    if not self.columns.equals(other.columns): 
    3399     join_columns, clidx, cridx = \ 
    3400      self.columns.join(other.columns, how=join, level=level, 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name) 
    2143     or name in self._metadata 
    2144     or name in self._accessors): 
-> 2145    return object.__getattribute__(self, name) 
    2146   else: 
    2147    if name in self._info_axis: 

AttributeError: 'Series' object has no attribute 'columns' 

我能做到这一点,但它不是矢量,而不是优雅的:

In [11]: pd.Series({col:series.corr(frame[col]) for col in frame}) 
Out[11]: 
0 0.286678 
1 -0.438003 
2 -0.011778 
3 -0.387740 
dtype: float64 

回答

2

您可以使用corrwith

>>> frame.corrwith(series) 
0 0.399534 
1 0.321166 
2 -0.101875 
3 0.604326 
dtype: float64 

在DataFrame上为com实现一个相关的方法corrwith指示包含在不同DataFrame对象中的类似标签的Series之间的相关性。

相关问题