sensitive_group_analysis(df, target_attr, groups, categorical_mode='multinomial')[source]#

This function produces a summary of the first two central moments of the distributions created from the target attribute by applying predicates generated by a list of groups of interest. Allows the user to quickly scan how the target varies and how the expected value is different based on possibly protected attributes. Supports binary, date-like, numerical and categorical data for the target column.

  • df (pd.DataFrame) – The input datafame.

  • target_attr (str) – The target attribute in the dataframe from which the distributions are formed.

  • groups (List[Union[Mapping[str, List[Any]], pd.Series]]) – The list of groups of interest. Each group can be a mapping / dict from attribute to value or a predicate itself, i.e. pandas series consisting of bools which can be used as a predicate to index a subgroup from the dataframe. Examples of valid groups: {“Sex”: [“Male”]}, df[“Sex”] == “Female”

  • categorical_mode (str) – Allows the user to choose which method will be used for computing the first moment for categorical (and implicitly, binary) series. Can be “square”, “entropy” which will use the mode or “multinomial”, which returns the probability of each variable occuring. Defaults to “multinomial”.


A dataframe comprising and reporting the results for the means and variances across the groups of interest which is adapted to the type of the underlying data in the target column.

Return type