fairlens.plot.mult_distr_plot#

mult_distr_plot(df, target_attr, attrs, figsize=None, max_width=3, distr_type=None, attr_distr_types=None, max_quantiles=8, show_hist=None, show_curve=None, shade=True, normalize=False, cmap=None)[source]#

Plot the distribution of the all values for each of the unique values in the column attr with respect to the target attribute.

Parameters
  • df (pd.DataFrame) – The input dataframe.

  • target_attr (str) – The target attribute.

  • attrs (Sequence[str]) – The attributes whose value distributions are to be plotted.

  • figsize (Optional[Tuple[int, int]], optional) – The size of each figure if separate is True. Defaults to (6, 4).

  • max_width (int, optional) – The maximum amount of figures. Defaults to 3.

  • distr_type (Optional[str], optional) – The type of distribution of the target attribute. Can take values from [“categorical”, “continuous”, “binary”, “datetime”]. If None, the type of distribution is inferred based on the data in the column. Defaults to None.

  • attr_distr_types (Optional[Mapping[str, str]], optional) – The types of distribution of the attributes in attrs. Passed as a mapping from attribute name to corresponding distribution type. Can take values from [“categorical”, “continuous”, “binary”, “datetime”]. If None, the type of distribution of all sensitive attributes are inferred based on the data in the respective columns. Defaults to None.

  • max_quantiles (int, optional) – The maximum amount of quantiles to use for continuous data. Defaults to 8.

  • show_hist (Optional[bool], optional) – Shows the histogram if True. Defaults to True if the data is categorical or binary.

  • show_curve (Optional[bool], optional) – Shows a KDE if True. Defaults to True if the data is continuous or a date.

  • shade (bool, optional) – Shades the curve if True. Defaults to True.

  • normalize (bool, optional) – Normalizes the counts so the sum of the bar heights is 1. Defaults to False.

  • cmap (Optional[Sequence[Tuple[float, float, float]]], optional) – A sequence of RGB tuples used to colour the histograms. If None seaborn’s default pallete will be used. Defaults to None.

Examples

>>> df = pd.read_csv("datasets/compas.csv")
>>> mult_distr_plot(df, "RawScore", ["Ethnicity", "Sex", "MaritalStatus", "Language", "DateOfBirth"])
>>> plt.show()
../../_images/mult_distr_plot.png