fairlens.FairnessScorer¶
-
class
FairnessScorer
(df, target_attr, sensitive_attrs=None, detect_sensitive=False, distr_type=None, sensitive_distr_types=None)[source]¶ Bases:
object
This class analyzes a given DataFrame, looks for biases and quantifies fairness.
Methods
Fairness Scorer constructor
Generate a report on the fairness of different groups of sensitive attributes.
Returns a dataframe consisting of all unique sub-groups and their statistical distance to the rest of the population w.r.t.
Plot the distributions of the target variable with respect to all sensitive values.
-
__init__
(df, target_attr, sensitive_attrs=None, detect_sensitive=False, distr_type=None, sensitive_distr_types=None)[source]¶ Fairness Scorer constructor
- Parameters
df (pd.DataFrame) – Input DataFrame to be scored.
target_attr (str) – The target attribute name.
sensitive_attrs (Optional[Sequence[str]], optional) – The sensitive attribute names. Defaults to None.
detect_sensitive (bool, optional) – Whether to try to detect sensitive attributes from the column names. Defaults to False.
distr_type (Optional[str], optional) – The type of distribution of the target attribute. Can take values from [“categorical”, “continuous”, “binary”, “datetime”]. If None, the type of distribution is inferred based on the data in the column. Defaults to None.
sensitive_distr_types (Optional[Mapping[str, str]], optional) – The type of distribution of the sensitive attributes. Passed as a mapping from sensitive attribute name to corresponding distribution type. Can take values from [“categorical”, “continuous”, “binary”, “datetime”]. If None, the type of distribution of all sensitive attributes are inferred based on the data in the respective columns. Defaults to None.
-
demographic_report
(metric='auto', method='dist_to_all', alpha=0.05, max_comb=4, min_count=100, max_rows=10, hide_positive=False)[source]¶ Generate a report on the fairness of different groups of sensitive attributes.
- Parameters
metric (str, optional) – Choose a custom metric to use. Defaults to automatically chosen metric depending on the distribution of the target variable. See
method (str, optional) – The method used to apply the metric to the sub-group. Can take values [“dist_to_all”, “dist_to_rest”] which correspond to measuring the distance between the subgroup distribution and the overall distribution, or the overall distribution without the subgroup, respectively. Defaults to “dist_to_all”.
alpha (Optional[float], optional) – The maximum p-value to accept a bias. Defaults to 0.05.
max_comb (Optional[int], optional) – Max number of combinations of sensitive attributes to be considered. If None all combinations are considered. Defaults to 4.
min_count (Optional[int], optional) – If set, sub-groups with less samples than min_count will be ignored. Defaults to 100.
max_rows (int, optional) – Maximum number of biased demographics to display. Defaults to 10.
hide_positive (bool, optional) – Hides positive distances if set to True. This may be useful when using metrics which can return negative distances (binomial distance), in order to inspect a skew in only one direction. Alternatively changing the method may yeild more significant results. Defaults to False.
-
distribution_score
(metric='auto', method='dist_to_all', p_value=False, max_comb=None)[source]¶ Returns a dataframe consisting of all unique sub-groups and their statistical distance to the rest of the population w.r.t. the target variable.
- Parameters
metric (str, optional) – Choose a metric to use. Defaults to automatically chosen metric depending on the distribution of the target variable.
method (str, optional) – The method used to apply the metric to the sub-group. Can take values [“dist_to_all”, dist_to_rest”] which correspond to measuring the distance between the subgroup distribution and the overall distribution, or the overall distribution without the subgroup, respectively. Defaults to “dist_to_all”.
p_value (bool, optional) – Whether or not to compute a p-value for the distances.
max_comb (Optional[int], optional) – Max number of combinations of sensitive attributes to be considered. If None all combinations are considered. Defaults to 4.
- Return type
pandas.core.frame.DataFrame
-
plot_distributions
(figsize=None, max_width=3, max_quantiles=8, show_hist=None, show_curve=None, shade=True, normalize=False, cmap=None)[source]¶ Plot the distributions of the target variable with respect to all sensitive values.
- Parameters
figsize (Optional[Tuple[int, int]], optional) – The size of each figure. Defaults to (6, 4).
max_width (int, optional) – The maximum amount of figures. Defaults to 3.
max_quantiles (int, optional) – The maximum amount of quantiles to use for continuous data. Defaults to 8.
show_hist (Optional[bool], optional) – Shows the histogram if True. Defaults to True if the data is categorical or binary.
show_curve (Optional[bool], optional) – Shows a KDE if True. Defaults to True if the data is continuous or a date.
shade (bool, optional) – Shades the curve if True. Defaults to True.
normalize (bool, optional) – Normalizes the counts so the sum of the bar heights is 1. Defaults to False.
cmap (Optional[Sequence[Tuple[float, float, float]]], optional) – A sequence of RGB tuples used to colour the histograms. If None seaborn’s default pallete will be used. Defaults to None.
-