Skip to content

NandhuRajRK/TemporalShiftLab

Repository files navigation

TemporalShiftLab

TemporalShiftLab is a script-first benchmark for time-series anomaly detection under shift. It focuses on failure behavior (noise, missingness, drift), not just clean-data accuracy.

What it measures

  • point-level precision/recall/F1
  • event-level precision/recall/F1
  • mean detection latency across anomaly windows

Project structure

  • core/: detectors, dataset loading, stress transforms, metrics, experiment runner
  • scripts/: benchmark/stress scripts + full experiment orchestrator
  • configs/: experiment schema instances (experiment.yaml)
  • reports/: generated outputs and run artifacts

Setup

python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
# optional detectors (autoencoder baseline)
pip install -r requirements-optional.txt

Reviewer Quickstart (copy/paste)

# install
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

# run full experiment
python -m scripts.run_all --config configs/experiment.yaml
python -m scripts.export_run_report
python -m scripts.plot_run_summary --dataset synthetic_demo --stress baseline --metric event_f1

# inspect outputs
Get-ChildItem reports/runs
Get-Content reports/runs/<latest_run_dir>/run_manifest.json
Get-Content reports/runs/<latest_run_dir>/EXPERIMENT_REPORT.md

Run

python -m scripts.run_benchmark
python -m scripts.run_stress_suite
python -m scripts.export_report
python -m scripts.run_all --config configs/experiment.yaml
python -m scripts.export_run_report

Select detectors explicitly:

python -m scripts.run_benchmark --detectors rolling_zscore,rolling_mad
python -m scripts.run_stress_suite --detectors rolling_zscore,rolling_mad

Full experiment pipeline (recommended)

scripts/run_all.py executes a full matrix:

  • datasets (synthetic + real CSV)
  • detectors
  • stress cases
  • seeds

It also writes run governance artifacts:

  • run_manifest.json (git SHA, timestamp, seeds, detector list, config hash)
  • resolved_config.yaml
  • raw_runs.json
  • summary_stats.json (mean/std + 95% bootstrap CI)
  • significance_event_f1.json (pairwise sign-test summary)
  • skipped.json

Threshold strategy is configurable in configs/experiment.yaml:

  • validation (quantile search on calibration split)
  • percentile (single q)
  • fixed (manual value)
  • robust_mad (median + k*MAD)

Generated files:

  • reports/benchmark_result.json
  • reports/stress_suite.json
  • reports/REPORT.md
  • reports/figures/benchmark_f1.png
  • reports/figures/stress_event_f1.png
  • reports/runs/<timestamp>_<experiment_name>/...
  • reports/runs/<timestamp>_<experiment_name>/plot_<dataset>_<stress>_<metric>.png

Scope

This repo intentionally excludes backend/frontend/container layers to stay focused on reproducible AI evaluation logic for portfolio review.

Plug-and-play detectors

Detectors are registered in core/detectors.py and follow a simple contract:

  • fit(train_values)
  • score(test_values) -> anomaly_scores

Add a new detector class there, register its name in _registry(), then pass it via --detectors.

Current detector set includes:

  • rolling_zscore
  • rolling_mad
  • isolation_forest
  • autoencoder (optional; requires torch)

If torch is not installed, autoencoder runs are captured in skipped artifacts with a clear reason.

Real dataset track

Real CSV datasets must include:

  • value (float)
  • label (0/1)

Optional timestamp is allowed and ignored by the runner. See demo_data/real_sample.csv and configs/experiment.yaml.

UCR-style datasets are supported as first-class config entries:

  • type: ucr_style
  • train_path, test_path
  • optional labels_path with either start,end windows or label column
  • for UCR 2018 class datasets, use labels_path: __from_test_first_col__ to map first-class label to anomaly proxy (min class => normal, others => anomaly)

SMAP/MSL is scaffolded for later drop-in:

  • type: smap_msl_scaffold
  • root_dir containing train.csv, test.csv, labels.csv
  • sample structure provided in demo_data/smap_msl_scaffold/

Current benchmark config uses the repo-root data/ folder directly:

  • data/UCRArchive_2018/... (UCR representative subsets)
  • data/archive/... (SMAP/MSL with official labeled_anomalies.csv)
  • data/archive (2)/... + data/nab_combined_windows.json (NAB combined-windows format)

PRD Coverage (script-first)

Implemented:

  • point precision/recall/F1 + event precision/recall/F1 + latency
  • multiple threshold strategies (fixed/percentile/validation/robust_mad)
  • stress tests for missingness/noise/drift
  • UCR-style dataset flow + SMAP/MSL scaffold path
  • optional autoencoder baseline path (dependency-gated)
  • configurable experiment matrix with CI/significance + run manifests

Not implemented in this script-only repo:

  • frontend threshold-lab UI interactions
  • backend API routes/pages specified in original full-stack PRD

About

Config-driven benchmark suite for time-series anomaly detectors across UCR, SMAP/MSL, and NAB with reproducible evaluation and reporting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages