Critical Values Analysis

mlbugdetection.critical_values.find_critical_values(model, sample, feature: str, start: int, stop: int, step: float = 1, keep_n: int = 3)[source]

Critical Values Finder: Finds highest changes (positive or negative) in predict_proba over an specified inteval [start, stop].

Parameters

model (sklearn model or str) – Model already trained and tested from scikit-learn. Could be a model object or a path to a model file.
sample (pandas DataFrame) – A single row of the dataframe that will be used for the analysis.
feature (str) – Feature of dataframe that will be analysed.
start (int) – The starting value of the feature’s interval.
stop (int) – The end value of the feature’s interval.
step (float, default=1) – Size of the step between ranges “start” and “stop”. Ex: step = 0.1 between ranges 0 and 1 will result in [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9].
keep_n (int, default=3) – Number of values that are to be keeped in each list.

Returns

AnalysisReport object with following attributes – For more information: >>> from mlbugdetection.analysis_report import AnalysisReport >>> help(AnalysisReport)
model_name (str) – Name of the model being analysed.
analysed_feature (str) – Name of the feature being analysed.
feature_range (tuple) – Range of values of the feature being analysed: (start, stop).
metrics (dictionary) – Dictionary with all the calculated metrics, such as:

’positive_changes_proba’List

List of feature ranges that resulted in the biggest positive changes in the model`s prediction probability.

’positive_changes_proba’List
List of biggest positive variations in the model`s prediction probability.

’negative_changes_ranges’List
List of feature ranges that resulted in the biggest negative changes in the model`s prediction probability.

’negative_changes_proba’List
List of biggest negative variations in the model`s prediction probability.

’classification_change_ranges’List
List of feature ranges that resulted in a change of the model`s classification.

’classification_change_proba’List
List of prediction probability values before and after the classification change.
graphs (List) – List of all the figures created.

mlbugdetection.critical_values.find_several_critical_values(model, samples, feature: str, start: int, stop: int, step: float = 1, bins: int = 15, keep_n: int = 5, log: bool = False)[source]

Critical Values Finder in Several Samples: Finds mean, median, standard deviation, variation of the critical values found in the samples over an specified inteval [start, stop].

Parameters

model (sklearn model or str) – Model already trained and tested from scikit-learn. Could be a model object or a path to a model file.
samples (pandas DataFrame) – Two or more rows of the dataframe that will be used for the analysis.
feature (str) – Feature of dataframe that will be analysed.
start (int) – The starting value of the feature’s interval.
stop (int) – The end value of the feature’s interval.
step (float, default=1) – Size of the step between ranges “start” and “stop”. Ex: step = 0.1 between ranges 0 and 1 will result in [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9].
bins (int, default=15) – It defines the number of equal-width bins in the range.
keep_n (int, default=5) – Number of the highest values to use for mean, median, std, var calculation.
log (bool, default=False) – If True, the histogram axis will be set to a log scale.

Returns

AnalysisReport object with following attributes – For more information: >>> from mlbugdetection.analysis_report import AnalysisReport >>> help(AnalysisReport)
model_name (str) – Name of the model being analysed.
analysed_feature (str) – Name of the feature being analysed.
feature_range (tuple) – Range of values of the feature being analysed: (start, stop).
metrics (dictionary) – Dictionary with all the calculated metrics, such as:

’positive_means’dictionary
Contains the following:

’mean’float
Mean of the all the positive changes means

’median’float
Median of the all the positive changes means

’std’float
Standard Deviation of the all the positive changes means

’var’float
Variation of the all the positive changes means

’negative_means’dictionary
Contains the following:

’mean’float
Mean of the all the negative changes means

’median’float
Median of the all the negative changes means

’std’float
Standard Deviation of the all the negative changes means

’var’float
Variation of the all the negative changes means
graphs (List) – List of all the figures created.

mlbugdetection.critical_values.highest_and_lowest_indexes(predictions: list, keep_n: int = 3)[source]

Return indexes of highest changes (positive or negative): in predictions

Parameters

predictions (list) – Array that contains predictions to be analysed
keep_n (int) – Number of values that are to be keeped in each list