Is It Possible To Do Running Correlation With One Fixed Series In Python?
I'm wondering if there is a fast way to do running correlation in Python with one fixed series? I've tried to use Pandas and for example: df1.rolling(4).corr(df2). However, it req
Solution 1:
You can use the pandas.DataFrame.rolling
which returns
pandas.core.window.Rolling
which has apply method. Then you could pass to apply()
any function that calculates the correction you want.
Example
- Let's say you are interested in the Pearson correlation coefficient. That can be calculated using scipy.stats.pearsonr.
import pandas as pd
from scipy.stats import pearsonr
import numpy as np
df1 = pd.DataFrame([1,3,2,4,5,6,3,4,1,2,3,2,2,3,2,5,1,2,1,2,8,8,8,8,8,8,8])
df2 = pd.DataFrame([1,2,3,2])
CORR_VALS = df2[0].values
defget_correlation(vals):
return pearsonr(vals, CORR_VALS)[0]
df1['correlation'] = df1.rolling(window=len(CORR_VALS)).apply(get_correlation)
- Note that the
window
argument in thedf1.rolling()
should have the same length as the array you are calculating correlation against.
this outputs
In [5]: df1['correlation'].values
Out[5]:
array([ nan, nan, nan, 0.31622777, 0.31622777,
0.71713717, 0.63245553, -0.63245553, -0.39223227, -0.63245553,
-0.63245553, 1. , 0. , -0.70710678, 0.81649658,
0. , 0.47809144, -0.23570226, -0.64699664, 0. ,
0. , 0.7570333 , 0.76509206, 0.11043153, -0.77302068,
-0.11043153, 0.86164044])
which would look like this:
Post a Comment for "Is It Possible To Do Running Correlation With One Fixed Series In Python?"