fairlens.metrics.JensenShannonDivergence#
- class JensenShannonDivergence(bin_edges=None)[source]#
Bases:
fairlens.metrics.distance.CategoricalDistanceMetric
Jensen-Shannon Divergence between two probability distributions.
Methods
Initialize categorical distance metric.
Check whether the input is valid.
Distance between the distribution of numerical data in x and y.
Distance between 2 aligned normalized histograms.
Returns a p-value for the test that x and y are sampled from the same distribution.
- Parameters
bin_edges (Optional[numpy.ndarray]) –
- __call__(x, y)#
Calculate the distance between two distributions.
- Parameters
x (pd.Series) – The data in the column representing the first group.
y (pd.Series) – The data in the column representing the second group.
- Returns
The computed distance.
- Return type
Optional[float]
- __init__(bin_edges=None)#
Initialize categorical distance metric.
- Parameters
bin_edges (Optional[np.ndarray], optional) – A numpy array of bin edges used to bin continuous data or to indicate bins of pre-binned data to metrics which take the distance space into account. i.e. For bins [0-5, 5-10, 10-15, 15-20], bin_edges would be [0, 5, 10, 15, 20]. See numpy.histogram_bin_edges() for more information.
- check_input(x, y)#
Check whether the input is valid. Returns False if x and y have different dtypes by default.
- Parameters
x (pd.Series) – The data in the column representing the first group.
y (pd.Series) – The data in the column representing the second group.
- Returns
Whether or not the input is valid.
- Return type
bool
- distance(x, y)#
Distance between the distribution of numerical data in x and y. Derived classes must implement this.
- Parameters
x (pd.Series) – Numerical data in a column.
y (pd.Series) – Numerical data in a column.
- Returns
The computed distance.
- Return type
float
- distance_pdf(p, q, bin_edges)[source]#
Distance between 2 aligned normalized histograms. Derived classes must implement this.
- Parameters
p (pd.Series) – A normalized histogram.
q (pd.Series) – A normalized histogram.
bin_edges (Optional[np.ndarray]) – bin_edges for binned continuous data. Used by metrics such as Earth Mover’s Distance to compute the distance metric space.
- Returns
The computed distance.
- Return type
float
- property id: str#
A string identifier for the method. Used by fairlens.metrics.stat_distance(). Derived classes must implement this.
- p_value(x, y)#
Returns a p-value for the test that x and y are sampled from the same distribution.
- Parameters
x (pd.Series) – Numerical data in a column.
y (pd.Series) – Numerical data in a column.
- Returns
The computed p-value.
- Return type
float