TemporalShiftLab

TemporalShiftLab is a script-first benchmark for time-series anomaly detection under shift. It focuses on failure behavior (noise, missingness, drift), not just clean-data accuracy.

What it measures

point-level precision/recall/F1
event-level precision/recall/F1
mean detection latency across anomaly windows

Project structure

core/: detectors, dataset loading, stress transforms, metrics, experiment runner
scripts/: benchmark/stress scripts + full experiment orchestrator
configs/: experiment schema instances (experiment.yaml)
reports/: generated outputs and run artifacts

Setup

python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
# optional detectors (autoencoder baseline)
pip install -r requirements-optional.txt

Reviewer Quickstart (copy/paste)

# install
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

# run full experiment
python -m scripts.run_all --config configs/experiment.yaml
python -m scripts.export_run_report
python -m scripts.plot_run_summary --dataset synthetic_demo --stress baseline --metric event_f1

# inspect outputs
Get-ChildItem reports/runs
Get-Content reports/runs/<latest_run_dir>/run_manifest.json
Get-Content reports/runs/<latest_run_dir>/EXPERIMENT_REPORT.md

Run

python -m scripts.run_benchmark
python -m scripts.run_stress_suite
python -m scripts.export_report
python -m scripts.run_all --config configs/experiment.yaml
python -m scripts.export_run_report

Select detectors explicitly:

python -m scripts.run_benchmark --detectors rolling_zscore,rolling_mad
python -m scripts.run_stress_suite --detectors rolling_zscore,rolling_mad

Full experiment pipeline (recommended)

scripts/run_all.py executes a full matrix:

datasets (synthetic + real CSV)
detectors
stress cases
seeds

It also writes run governance artifacts:

run_manifest.json (git SHA, timestamp, seeds, detector list, config hash)
resolved_config.yaml
raw_runs.json
summary_stats.json (mean/std + 95% bootstrap CI)
significance_event_f1.json (pairwise sign-test summary)
skipped.json

Threshold strategy is configurable in configs/experiment.yaml:

validation (quantile search on calibration split)
percentile (single q)
fixed (manual value)
robust_mad (median + k*MAD)

Generated files:

reports/benchmark_result.json
reports/stress_suite.json
reports/REPORT.md
reports/figures/benchmark_f1.png
reports/figures/stress_event_f1.png
reports/runs/<timestamp>_<experiment_name>/...
reports/runs/<timestamp>_<experiment_name>/plot_<dataset>_<stress>_<metric>.png

Scope

This repo intentionally excludes backend/frontend/container layers to stay focused on reproducible AI evaluation logic for portfolio review.

Plug-and-play detectors

Detectors are registered in core/detectors.py and follow a simple contract:

fit(train_values)
score(test_values) -> anomaly_scores

Add a new detector class there, register its name in _registry(), then pass it via --detectors.

Current detector set includes:

rolling_zscore
rolling_mad
isolation_forest
autoencoder (optional; requires torch)

If torch is not installed, autoencoder runs are captured in skipped artifacts with a clear reason.

Real dataset track

Real CSV datasets must include:

value (float)
label (0/1)

Optional timestamp is allowed and ignored by the runner. See demo_data/real_sample.csv and configs/experiment.yaml.

UCR-style datasets are supported as first-class config entries:

type: ucr_style
train_path, test_path
optional labels_path with either start,end windows or label column
for UCR 2018 class datasets, use labels_path: __from_test_first_col__ to map first-class label to anomaly proxy (min class => normal, others => anomaly)

SMAP/MSL is scaffolded for later drop-in:

type: smap_msl_scaffold
root_dir containing train.csv, test.csv, labels.csv
sample structure provided in demo_data/smap_msl_scaffold/

Current benchmark config uses the repo-root data/ folder directly:

data/UCRArchive_2018/... (UCR representative subsets)
data/archive/... (SMAP/MSL with official labeled_anomalies.csv)
data/archive (2)/... + data/nab_combined_windows.json (NAB combined-windows format)

PRD Coverage (script-first)

Implemented:

point precision/recall/F1 + event precision/recall/F1 + latency
multiple threshold strategies (fixed/percentile/validation/robust_mad)
stress tests for missingness/noise/drift
UCR-style dataset flow + SMAP/MSL scaffold path
optional autoencoder baseline path (dependency-gated)
configurable experiment matrix with CI/significance + run manifests

Not implemented in this script-only repo:

frontend threshold-lab UI interactions
backend API routes/pages specified in original full-stack PRD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TemporalShiftLab

What it measures

Project structure

Setup

Reviewer Quickstart (copy/paste)

Run

Full experiment pipeline (recommended)

Scope

Plug-and-play detectors

Real dataset track

PRD Coverage (script-first)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
configs		configs
core		core
data		data
demo_data		demo_data
reports		reports
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pytest.ini		pytest.ini
requirements-optional.txt		requirements-optional.txt
requirements.lock		requirements.lock
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TemporalShiftLab

What it measures

Project structure

Setup

Reviewer Quickstart (copy/paste)

Run

Full experiment pipeline (recommended)

Scope

Plug-and-play detectors

Real dataset track

PRD Coverage (script-first)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages