fairlens.metrics.ContinuousDistanceMetric

class ContinuousDistanceMetric(p_value_test='bootstrap')[source]

Bases: fairlens.metrics.distance.DistanceMetric

Base class for distance metrics on continuous data.

Subclasses must implement a distance method.

Methods

__init__

Initialize continuous distance metric.

check_input

Check whether the input is valid.

distance

Distance between the distribution of numerical data in x and y.

p_value

Returns a p-value for the test that x and y are sampled from the same distribution.

__call__(x, y)

Calculate the distance between two distributions.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

The computed distance.

Return type

Optional[float]

__init__(p_value_test='bootstrap')[source]

Initialize continuous distance metric.

Parameters

p_value_test (str, optional) – Choose which method of resampling will be used to compute the p-value. Overidden by metrics such as Kolmogrov Smirnov Distance. Defaults to “permutation”.

check_input(x, y)[source]

Check whether the input is valid. Returns False if x and y have different dtypes by default.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

Whether or not the input is valid.

Return type

bool

abstract distance(x, y)

Distance between the distribution of numerical data in x and y. Derived classes must implement this.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed distance.

Return type

float

abstract property id

A string identifier for the method. Used by fairlens.metrics.stat_distance(). Derived classes must implement this.

p_value(x, y)[source]

Returns a p-value for the test that x and y are sampled from the same distribution.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed p-value.

Return type

float