Critical Values Analysis

mlbugdetection.critical_values.find_critical_values(model, sample, feature: str, start: int, stop: int, step: float = 1, keep_n: int = 3)[source]
Critical Values Finder

Finds highest changes (positive or negative) in predict_proba over an specified inteval [start, stop].

Parameters
  • model (sklearn model or str) – Model already trained and tested from scikit-learn. Could be a model object or a path to a model file.

  • sample (pandas DataFrame) – A single row of the dataframe that will be used for the analysis.

  • feature (str) – Feature of dataframe that will be analysed.

  • start (int) – The starting value of the feature’s interval.

  • stop (int) – The end value of the feature’s interval.

  • step (float, default=1) – Size of the step between ranges “start” and “stop”. Ex: step = 0.1 between ranges 0 and 1 will result in [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9].

  • keep_n (int, default=3) – Number of values that are to be keeped in each list.

Returns

  • AnalysisReport object with following attributes – For more information: >>> from mlbugdetection.analysis_report import AnalysisReport >>> help(AnalysisReport)

  • model_name (str) – Name of the model being analysed.

  • analysed_feature (str) – Name of the feature being analysed.

  • feature_range (tuple) – Range of values of the feature being analysed: (start, stop).

  • metrics (dictionary) – Dictionary with all the calculated metrics, such as:

    ’positive_changes_proba’List

    List of feature ranges that resulted in the biggest positive changes in the model`s prediction probability.

    ’positive_changes_proba’List

    List of biggest positive variations in the model`s prediction probability.

    ’negative_changes_ranges’List

    List of feature ranges that resulted in the biggest negative changes in the model`s prediction probability.

    ’negative_changes_proba’List

    List of biggest negative variations in the model`s prediction probability.

    ’classification_change_ranges’List

    List of feature ranges that resulted in a change of the model`s classification.

    ’classification_change_proba’List

    List of prediction probability values before and after the classification change.

  • graphs (List) – List of all the figures created.

mlbugdetection.critical_values.find_several_critical_values(model, samples, feature: str, start: int, stop: int, step: float = 1, bins: int = 15, keep_n: int = 5, log: bool = False)[source]
Critical Values Finder in Several Samples

Finds mean, median, standard deviation, variation of the critical values found in the samples over an specified inteval [start, stop].

Parameters
  • model (sklearn model or str) – Model already trained and tested from scikit-learn. Could be a model object or a path to a model file.

  • samples (pandas DataFrame) – Two or more rows of the dataframe that will be used for the analysis.

  • feature (str) – Feature of dataframe that will be analysed.

  • start (int) – The starting value of the feature’s interval.

  • stop (int) – The end value of the feature’s interval.

  • step (float, default=1) – Size of the step between ranges “start” and “stop”. Ex: step = 0.1 between ranges 0 and 1 will result in [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9].

  • bins (int, default=15) – It defines the number of equal-width bins in the range.

  • keep_n (int, default=5) – Number of the highest values to use for mean, median, std, var calculation.

  • log (bool, default=False) – If True, the histogram axis will be set to a log scale.

Returns

  • AnalysisReport object with following attributes – For more information: >>> from mlbugdetection.analysis_report import AnalysisReport >>> help(AnalysisReport)

  • model_name (str) – Name of the model being analysed.

  • analysed_feature (str) – Name of the feature being analysed.

  • feature_range (tuple) – Range of values of the feature being analysed: (start, stop).

  • metrics (dictionary) – Dictionary with all the calculated metrics, such as:

    ’positive_means’dictionary

    Contains the following:

    ’mean’float

    Mean of the all the positive changes means

    ’median’float

    Median of the all the positive changes means

    ’std’float

    Standard Deviation of the all the positive changes means

    ’var’float

    Variation of the all the positive changes means

    ’negative_means’dictionary

    Contains the following:

    ’mean’float

    Mean of the all the negative changes means

    ’median’float

    Median of the all the negative changes means

    ’std’float

    Standard Deviation of the all the negative changes means

    ’var’float

    Variation of the all the negative changes means

  • graphs (List) – List of all the figures created.

mlbugdetection.critical_values.highest_and_lowest_indexes(predictions: list, keep_n: int = 3)[source]
Return indexes of highest changes (positive or negative)

in predictions

Parameters
  • predictions (list) – Array that contains predictions to be analysed

  • keep_n (int) – Number of values that are to be keeped in each list