fairlens.metrics.BinomialDistance¶
-
class
BinomialDistance
(p_value_test='bootstrap')[source]¶ Bases:
fairlens.metrics.distance.ContinuousDistanceMetric
Difference distance between two binary data samples. i.e p_x - p_y, where p_x, p_y are the probabilities of success in x and y, respectively. The p-value computed is for the null hypothesis is that the probability of success is p_y. Data is assumed to be a series of 1, 0 (success, failure) Bernoulli random variates.
Methods
Initialize continuous distance metric.
Check whether the input is valid.
Distance between the distribution of numerical data in x and y.
Returns a p-value for the test that x and y are sampled from the same distribution.
-
__call__
(x, y)¶ Calculate the distance between two distributions.
- Parameters
x (pd.Series) – The data in the column representing the first group.
y (pd.Series) – The data in the column representing the second group.
- Returns
The computed distance.
- Return type
Optional[float]
-
__init__
(p_value_test='bootstrap')¶ Initialize continuous distance metric.
- Parameters
p_value_test (str, optional) – Choose which method of resampling will be used to compute the p-value. Overidden by metrics such as Kolmogrov Smirnov Distance. Defaults to “permutation”.
-
check_input
(x, y)[source]¶ Check whether the input is valid. Returns False if x and y have different dtypes by default.
- Parameters
x (pd.Series) – The data in the column representing the first group.
y (pd.Series) – The data in the column representing the second group.
- Returns
Whether or not the input is valid.
- Return type
bool
-
distance
(x, y)[source]¶ Distance between the distribution of numerical data in x and y. Derived classes must implement this.
- Parameters
x (pd.Series) – Numerical data in a column.
y (pd.Series) – Numerical data in a column.
- Returns
The computed distance.
- Return type
float
-
property
id
¶ A string identifier for the method. Used by fairlens.metrics.stat_distance(). Derived classes must implement this.
-