fairlens.metrics.correlation_matrix¶
-
correlation_matrix
(df, num_num_metric=<function pearson>, cat_num_metric=<function kruskal_wallis>, cat_cat_metric=<function cramers_v>, columns_x=None, columns_y=None)[source]¶ This function creates a correlation matrix out of a dataframe, using a correlation metric for each possible type of pair of series (i.e. numerical-numerical, categorical-numerical, categorical-categorical).
- Parameters
df (pd.DataFrame) – The dataframe that will be analyzed to produce correlation coefficients.
num_num_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for numerical-numerical series pairs. Defaults to Pearson’s correlation coefficient.
cat_num_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for categorical-numerical series pairs. Defaults to Kruskal-Wallis’ H Test.
cat_cat_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for categorical-categorical series pairs. Defaults to corrected Cramer’s V statistic.
columns_x (Optional[List[str]]) – The column names that determine the rows of the matrix.
columns_y (Optional[List[str]]) – The column names that determine the columns of the matrix.
- Returns
The correlation matrix to be used in heatmap generation.
- Return type
pd.DataFrame