fairlens.metrics.ContinuousDistanceMetric#

class ContinuousDistanceMetric(p_value_test='bootstrap')[source]#

Bases: fairlens.metrics.distance.DistanceMetric

Base class for distance metrics on continuous data.

Subclasses must implement a distance method.

Methods

__init__

Initialize continuous distance metric.

check_input

Check whether the input is valid.

distance

Distance between the distribution of numerical data in x and y.

p_value

Returns a p-value for the test that x and y are sampled from the same distribution.

__call__(x, y)#

Calculate the distance between two distributions.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

The computed distance.

Return type

Optional[float]

__init__(p_value_test='bootstrap')[source]#

Initialize continuous distance metric.

Parameters

p_value_test (str, optional) – Choose which method of resampling will be used to compute the p-value. Overidden by metrics such as Kolmogrov Smirnov Distance. Defaults to “permutation”.

check_input(x, y)[source]#

Check whether the input is valid. Returns False if x and y have different dtypes by default.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

Whether or not the input is valid.

Return type

bool

abstract distance(x, y)#

Distance between the distribution of numerical data in x and y. Derived classes must implement this.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed distance.

Return type

float

abstract property id: str#

A string identifier for the method. Used by fairlens.metrics.stat_distance(). Derived classes must implement this.

p_value(x, y)[source]#

Returns a p-value for the test that x and y are sampled from the same distribution.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed p-value.

Return type

float