RepBench is a tool for measuring and comparing the performance of algorithms repairing anomalies in datasets. It offers various algorithms and metrics for evaluating the effectiveness of anomaly repair under different contamination conditions. Users can introduce different types of anomalies into datasets and use the RepBench web application to view the data, repair results, and experiment with algorithm parameters.
Prerequisites | Build | Repair | Injection | RepBench Web Application
This benchmark implements four different anomaly repair techniques in time series and evaluates their precision and runtime on various real-world time series datasets using different repair scenarios.
- The benchmark implements the following algorithms: IMR, SCREEN, Robust PCA and CDrep.
- All the datasets used in this benchmark can be found here.
- The full list of repair scenarios can be found here.
- Ubuntu 22 (including Ubuntu derivatives, e.g., Xubuntu).
- Clone this repository.
install python and pip
sudo apt install python3-dev
sudo apt install python3-pipcreate a activate a virtual environment
sudo apt install python3-venv
python3 -m venv venv
source venv/bin/activateinstall the requirements for the Benchmark
pip3 install -r testing_frame_work/testing_framework_requierements.txtAdditionaly, to use the SRC algorithm you need Java to run on your system e.g., openjdk-17-jre.
python3 TestingFramework.py -d dataset -a anomaly_type -scen scenario_type -alg algorithm| dataset | anonaly_type | scenario_type | algorithm |
|---|---|---|---|
| bafu5k | shift | ts_len | rpca |
| humidity | distortion | a_size | screen |
| msd1_5 | outlier | a_rate | imr |
| elec | all | ts_nbr | cdrep |
| all | cts_nbr | kfilter | |
| all | a_factor | screen* | |
| all | all |
- The data has to have a csv format.
- The data argument expects the Data to be in the data folder.
All results and plots will be added to Results folder. The accuracy results of all algorithms will be sequentially
added for each scenario, dataset and anomaly type to: Results/.../.../precision/error/. The runtime results of all
algorithms will be added to: Results/.../.../runtime/. The plots of some anomaylous parts of the time series together
with its repair will be added to the folder Results/.../precision/repair/.
The Parameters of the algorithms can be modified in here
- Run a single algorithm (cdrec) on a single dataset (bafu5k) using one scenario (number of time series) and one anomaly (shift)
python3 TestingFramework.py -d bafu5k -scen ts_nbr -a shift -alg cdrep- Run two algorithms (cdrec, rpca) on two dataset (bafu5k,msd) using one scenario (a_rate) and two anomalies ( shift,outlier)
python3 TestingFramework.py -d bafu5k,msd -scen ts_nbr -a shift,outlier -alg cdrep,rpca- Run the whole benchmark: all the algorithms , all the dataset on all scenarios with all anomalies (takes ~6 hours)
python3 TestingFramework.py -d all -scen all -a all -alg allpython3 inject.py -d dataset -a anomaly_type -f factor/amplitude -r rate -ts time_series [-l lenght ]-d: Required. Dataset to inject anomalies into.-a: Required. Anomaly type to be injected. Choices are shift,distortion,outlier-f: Required. Factor to control the strength of the anomalies.-r: Required. Ratio of data points affected by the anomalies.-ts: Required ts index to be injected starting from 1 , multiple indices can be specified separated by comma e.g -ts1,2,3-l: Optional. The length of the time series. Default value is 30.
The Resulting Injected Data set is stored in injection/Results.
The input file must be in csv format without timestamps and the data folder is data/full.
- Inject a shift anomaly with factor 4 and ratio 0.1 into the first time series of the dataset bafu5k with length 30.
python3 inject.py -d bafu5k -a shift -f 4 -r 0.1 -ts 1 -l 30- Inject a distortion anomaly with factor 2 and ratio 0.2 into the first and second time series of the dataset: elec with length 10.
python3 inject.py -d elec -a distortion -f 2 -r 0.2 -ts 1,2 -l 10- Inject an outlier anomaly with factor 3 and ratio 0.25 into the first, second and third time series of the dataset humidity.
python3 inject.py -d humidity -a outlier -f 3 -r 0.25 -ts 1,2,3 To use the WebApp, you need to install Docker and
docker-compose build
docker-compose up
docker-compose exec web bash python3 manage.py makemigrations
docker-compose exec web bash python3 manage.py migrate
docker-compose exec web bash python3 manage.py shell`` In the shell:
from RepBenchWeb.models.populateDB import *Inject real-world data with anomalies. Select the time series and anomaly type to inject. Multiple injections are possible allowing for different anomaly types in a time series. After contamination, it is possible to switch to the repair view to try out the a repair technique on the affected data, similar to the process used for synthetic data. Directly repair the data set with anomalies and compare the repaired time series together with the original and injected series. We show the conventional metrics for anomaly repair: root mean square error (RMSE) and mean absolute error (MAE).
