fairlens.metrics.correlation_matrix#

correlation_matrix(df, num_num_metric=<function pearson>, cat_num_metric=<function kruskal_wallis>, cat_cat_metric=<function cramers_v>, columns_x=None, columns_y=None)[source]#

This function creates a correlation matrix out of a dataframe, using a correlation metric for each possible type of pair of series (i.e. numerical-numerical, categorical-numerical, categorical-categorical).

Parameters
  • df (pd.DataFrame) – The dataframe that will be analyzed to produce correlation coefficients.

  • num_num_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for numerical-numerical series pairs. Defaults to Pearson’s correlation coefficient.

  • cat_num_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for categorical-numerical series pairs. Defaults to Kruskal-Wallis’ H Test.

  • cat_cat_metric (Callable[[pd.Series, pd.Series], float], optional) – The correlation metric used for categorical-categorical series pairs. Defaults to corrected Cramer’s V statistic.

  • columns_x (Optional[List[str]]) – The column names that determine the rows of the matrix.

  • columns_y (Optional[List[str]]) – The column names that determine the columns of the matrix.

Returns

The correlation matrix to be used in heatmap generation.

Return type

pd.DataFrame