menelaus.change_detection

Change detection algorithms monitor sequential univariate metrics. They can either be applied to detect drift in a model’s performance metric or can be used in the context of monitoring a single feature from a dataset. The change detection algorithms presented in this package can detect bi-directional shifts, either upward or downward changes in a sequence.

menelaus.change_detection.adwin

class menelaus.change_detection.adwin.ADWIN(delta=0.002, max_buckets=5, new_sample_thresh=32, window_size_thresh=10, subwindow_size_thresh=5, conservative_bound=False)[source]

Bases: StreamingDetector

ADWIN (ADaptive WINdowing) is a change detection algorithm which uses a sliding window to estimate the running mean and variance of a given real-valued number.

As each sample is added, ADWIN stores a running estimate (mean and variance) for a given statistic, calculated over a sliding window which will grow to the right until drift is detected. The condition for drift is defined over pairs of subwindows at certain cutpoints within the current window. If, for any such pair, the difference between the running estimates of the statistic is over a certain threshold (controlled by delta), we identify drift, and remove the oldest elements of the window until all differences are again below the threshold.

The running estimates in each subwindow are maintained by storing summaries of the elements in “buckets,” which, in this implementation, are themselves stored in the bucket_row_list attribute, whose total size scales with the max_buckets parameter.

When drift occurs, the index of the element at the beginning of ADWIN’s new window is stored in self.retraining_recs.

Ref. Bifet and Gavalda [2007]

__init__(delta=0.002, max_buckets=5, new_sample_thresh=32, window_size_thresh=10, subwindow_size_thresh=5, conservative_bound=False)[source]

Parameters

delta (float, optional) – confidence value on on 0 to 1. ADWIN will incorrectly detect drift with at most probability delta, and correctly detect drift with at least probability 1 - delta. Defaults to 0.002.
max_buckets (int, optional) – the maximum number of buckets to maintain in each BucketRow. Corresponds to the “M” parameter in Bifet 2006. Defaults to 5.
new_sample_thresh (int, optional) – the drift detection procedure will run every new_sample_thresh samples, not in between. Defaults to 32.
window_size_thresh (int, optional) – the minimum number of samples in the window required to check for drift. Defaults to 10.
subwindow_size_thresh (int, optional) – the minimum number of samples in each subwindow reqired to check it for drift. Defaults to 5.
conservative_bound (bool, optional) – whether to assume a ‘large enough’ sample when constructing drift cutoff. Defaults to False.

Raises

ValueError – If ADWIN.delta is not on the range 0 to 1.

mean()[source]

Returns: the estimated average of the passed stream, using the current window
Return type: float

reset()[source]: Initialize the detector’s drift state and other relevant attributes. Intended for use after drift_state == 'drift'.

update(X, y_true=None, y_pred=None)[source]

Update the detector with a new sample.

Parameters

X – one row of features from input data.
y_true – one true label from input data. Not used by ADWIN.
y_pred – one predicted label from input data. Not used by ADWIN.

variance()[source]

Returns: the estimated variance of the passed stream, using the current window
Return type: float

property drift_state: Set detector’s drift state to "drift", "warning", or None.

property retraining_recs

Recommended indices for retraining. If drift is detected, set to [beginning of ADWIN's new window, end of ADWIN's new window]. If these are e.g. the 5th and 13th sample that ADWIN has been updated with, the values will be [4, 12].

Returns: the current retraining recommendations
Return type: list

property samples_since_reset

Number of samples since last drift detection.

Returns: int

property total_samples

Total number of samples the drift detector has been updated with.

Returns: int

menelaus.change_detection.cusum

class menelaus.change_detection.cusum.CUSUM(target=None, sd_hat=None, burn_in=30, delta=0.005, threshold=5, direction=None)[source]

Bases: StreamingDetector

CUSUM is a method from the field of statistical process control. This detector tests for changes in the mean of a time series by calculating a moving average over recent observations. CUSUM can be used for tracking a single model performance metric, or could be applied to the mean of a feature variable of interest.

Ref. Page [1954]

__init__(target=None, sd_hat=None, burn_in=30, delta=0.005, threshold=5, direction=None)[source]

Parameters

target (float, optional) – Known mean of stream (e.g. validation accuracy). If None, will be inferred from observations in the burn-in window. Defaults to None.
sd_hat (float, optional) – Known standard deviation of stream (e.g. SD of validation accuracy). If None, will be inferred from observations in the burn-in window. Defaults to None.
burn_in (int, optional) – Length of the burn-in period, during which time no alarms will sound. Also determines how many prior samples are used to calculate new estimates for mean and SD after drift occurs. Defaults to 30.
delta (float, optional) – The amount of “slack” in the CUSUM test statistic. Defaults to 0.005.
threshold (int, optional) – The threshold at which the CUSUM test statistic is evaluated against. Defaults to 50.
direction (str, optional) –
- If 'positive', drift is only considered when the stream drifts in the positive direction.
- If 'negative', drift is only considered when the stream drifts in the negative direction.
- If None, alarms to drift in either the positive or negative direction. Defaults to None.

reset()[source]: Initialize the detector’s drift state and other relevant attributes. Intended for use after drift_state == 'drift'.

update(X, y_true=None, y_pred=None)[source]

Update the detector with a new sample.

Parameters

X (numpy.ndarray) – one row of features from input data.
y_true (numpy.ndarray) – one true label from input data. Not used in CUSUM.
y_pred (numpy.ndarray) – one predicted label from input data. Not used in CUSUM.

property drift_state: Set detector’s drift state to "drift", "warning", or None.

input_type = 'stream'

property samples_since_reset

Number of samples since last drift detection.

Returns: int

property total_samples

Total number of samples the drift detector has been updated with.

Returns: int

menelaus.change_detection.page_hinkley

class menelaus.change_detection.page_hinkley.PageHinkley(delta=0.01, threshold=20, burn_in=30, direction='positive')[source]

Bases: StreamingDetector

Page-Hinkley is a univariate change detection algorithm, designed to detect changes in a sequential Gaussian signal. Both the running mean and the running Page Hinkley (PH) statistic are incremented with each observation. The PH stat monitors how far the current observation is from the running mean of all previously encountered observations, while weighting it by a sensitivity parameter delta. The detector alarms when the difference between the maximum or minimum PH statistic encountered is larger than the cumulative PH statistic certain threshold (xi).

Increment mean with next observations
Increment running sum of difference between observations and mean
Compute threshold & PH statistic
Enter drift or warning state if PH value is outside threshold, and the number of samples is greater than the burn-in requirement.

If the threshold is too small, PH may result in many false alarms. If too large, the PH test will be more robust, but may miss true drift.

Ref. Hinkley [1971]

__init__(delta=0.01, threshold=20, burn_in=30, direction='positive')[source]

Parameters

delta (float, optional) – Minimum amplitude of change in data needed to sound alarm. Defaults to 0.01.
threshold (int, optional) – Threshold for sounding alarm. Corresponds with PH lambda. As suggested in PCA-CD, Qahtan (2015) recommends setting to 1% of an appropriate window size for the dataset. Defaults to 20.
burn_in (int, optional) – Minimum number of data points required to be seen before declaring drift. Defaults to 30.
direction (str, optional) –
- If 'positive', drift is only detected for an upward change in mean, when the cumulative PH statistic differs from the minimum PH statistic significantly.
- If 'negative', drift is only detected for a downward change in mean, when the max PH statistic differs from the cumulative PH statistic significantly.
Defaults to 'positive'.

reset()[source]: Initialize the detector’s drift state and other relevant attributes. Intended for use after drift_state == 'drift'.

to_dataframe()[source]: Returns a dataframe storing current statistics

update(X, y_true=None, y_pred=None)[source]

Update the detector with a new sample.

Parameters

X – one row of features from input data.
y_true – one true label from input data. Not used by Page-Hinkley.
y_pred – one predicted label from input data. Not used by Page-Hinkley.

property drift_state: Set detector’s drift state to "drift", "warning", or None.

input_type = 'stream'

property samples_since_reset

Number of samples since last drift detection.

Returns: int

property total_samples

Total number of samples the drift detector has been updated with.

Returns: int