fairlens.metrics.Norm#

class Norm(bin_edges=None, ord=2)[source]#

Bases: fairlens.metrics.distance.CategoricalDistanceMetric

LP Norm between two probability distributions.

Methods

__init__

param bin_edges

A list of bin edges used to bin continuous data by or to indicate bins of pre-binned data.

check_input

Check whether the input is valid.

distance

Distance between the distribution of numerical data in x and y.

distance_pdf

Distance between 2 aligned normalized histograms.

p_value

Returns a p-value for the test that x and y are sampled from the same distribution.

Parameters
  • bin_edges (Optional[numpy.ndarray]) –

  • ord (Union[str, int]) –

__call__(x, y)#

Calculate the distance between two distributions.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

The computed distance.

Return type

Optional[float]

__init__(bin_edges=None, ord=2)[source]#
Parameters
  • bin_edges (Optional[np.ndarray], optional) – A list of bin edges used to bin continuous data by or to indicate bins of pre-binned data. Defaults to None.

  • ord (Union[str, int], optional) – The order of the norm. Possible values include positive numbers, ‘fro’, ‘nuc’. See numpy.linalg.norm for more details. Defaults to 2.

check_input(x, y)#

Check whether the input is valid. Returns False if x and y have different dtypes by default.

Parameters
  • x (pd.Series) – The data in the column representing the first group.

  • y (pd.Series) – The data in the column representing the second group.

Returns

Whether or not the input is valid.

Return type

bool

distance(x, y)#

Distance between the distribution of numerical data in x and y. Derived classes must implement this.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed distance.

Return type

float

distance_pdf(p, q, bin_edges)[source]#

Distance between 2 aligned normalized histograms. Derived classes must implement this.

Parameters
  • p (pd.Series) – A normalized histogram.

  • q (pd.Series) – A normalized histogram.

  • bin_edges (Optional[np.ndarray]) – bin_edges for binned continuous data. Used by metrics such as Earth Mover’s Distance to compute the distance metric space.

Returns

The computed distance.

Return type

float

property id: str#

A string identifier for the method. Used by fairlens.metrics.stat_distance(). Derived classes must implement this.

p_value(x, y)#

Returns a p-value for the test that x and y are sampled from the same distribution.

Parameters
  • x (pd.Series) – Numerical data in a column.

  • y (pd.Series) – Numerical data in a column.

Returns

The computed p-value.

Return type

float