feat(prediction): add predictive cooldown with historical usage patterns by owaindjones · Pull Request #19 · owaindjones/rouser

owaindjones · 2026-05-01T21:32:20Z

Summary

Implements #18 — Predictive cooldown based on system usage patterns. rouser now learns from historical CPU/network activity across days/weeks and dynamically extends the post-idle cooldown duration when patterns indicate likely continued active use, capped at a configurable maximum extension time.

Changes

Feature implementation (feat(prediction): add predictive cooldown based on historical usage patterns)

New prediction module with history (binary log) and model (statistical predictor) submodules
Time-aware hour-of-day analysis: tracks per-hour high-activity counts for CPU (>50%) and network/disk (>10Mbps/>5MB/s)
Linear score interpolation to configurable extension range when transitioning from inhibited → below-threshold state
Binary serialization via bincode v2 with date-partitioned files (history.log.YYYYMMDD) under XDG data dir or /var/lib/rouser/ for root
Automatic pruning of old history files (every ~12h, configurable retention)

Debug logging (feat(prediction): add debug logging and prune tracking to prediction model)

info! log on startup showing loaded historical data points
debug! log per record() call with metric values and hour-of-day bucket
debug!/info! logs in HistoryLog::prune() for each file removed and summary of pruned count
Flush logging when model records a new snapshot to the history buffer

Documentation (docs: add prediction feature docs, update config reference)

README.md: added "Predictive cooldown" to Key Features list with brief description
docs/configuration.md: added [prediction] section table documenting all 3 config keys (update_interval, history_length, max_extension_secs)
New docs/prediction-model.md: comprehensive guide explaining how the prediction model works — data collection, hour-of-day analysis, scoring algorithm, confidence calculation, and configuration tuning

Security audit (chore: add security review for prediction module)

Path validation: all file paths derived from XDG spec or config constants (no user input)
Bincode deserialization safe: length-prefixed format prevents buffer overread; truncated entries logged as warnings and skipped
Prune function validates YYYYMMDD filename format before processing; only matches history.log.* pattern files
No shell execution, symlink following, or world-writable permissions in history directory creation (0755)

Config alignment (refactor(prediction): use Duration type for max_extension_time)

Standardized all prediction timing fields to std::time::Duration with humantime_serde parsing
Renamed max_extension_secs: u64 → max_extension_time: Duration (default 1h)

Testing

All checks pass:

cargo fmt --check — clean
cargo clippy --all-targets -- -D warnings — zero warnings/errors
cargo test --all-targets — 148 tests pass (74 lib + 72 binary, 2 ignored hardware-specific)

Config Example

[prediction]
update_interval = "30s"          # How often to record a data point
history_length = "30d"           # Keep this much historical data
max_extension_time = "1h"        # Maximum additional cooldown extension

Manual QA Notes

Run RUST_LOG=debug rouser --dry-run to see prediction model initialization and per-tick logging
After running for several days, check $XDG_DATA_HOME/rouser/history.log.*.log files exist with date-partitioned data
Verify pruning by setting history_length = "1d" in config and observing debug logs on subsequent runs

…atterns Learns from daily system metric snapshots to dynamically extend the idle cooldown duration before releasing sleep inhibition. Uses a time-aware statistical model that scores CPU and network activity by hour-of-day, with a configurable max extension capped at 60 seconds. Key changes: - New prediction:: module with binary history log (bincode v2) using date-partitioned files under XDG_DATA_HOME or /var/lib/rouser - PredictionModel scores historical patterns and predicts additional cooldown seconds when metrics drop below threshold - Service.rs wires recording into tick() loop and applies predictions during cooldown transitions with info-level logging - Config adds [prediction] section with max_extension_secs (default 60s) - All clippy warnings resolved, tests pass (74+74 across lib/bin)

…ign all prediction fields - Replace u64 max_extension_secs with humantime_serde-parsed Duration (default 1h) across config, model, and service layers - Rename CooldownPrediction.additional_seconds to additional_time as std::time::Duration for consistency with other timing fields - Update DataManager.predicted_extension_secs → predicted_additional_time - Add pruning debug logging in HistoryLog when files are removed - Add record flush logging in PredictionModel on each data point write - Wire prune() call into service.rs tick loop (every ~12h via counter) - Add .sisyphus/ to .gitignore

…to tick loop Debug logging (Task 1): - Add per-tick debug log in PredictionModel::record() showing data point number with CPU max, network throughput, disk I/O, and UTC hour bucket - Add debug log when prune() is called on each service tick - Wire model.prune(history_length) into service.rs tick loop (safe due to daily deduplication in HistoryLog::prune()) Documentation (Task 2): - README.md: add 'Predictive cooldown' bullet to Key Features list - docs/configuration.md: add [prediction] section with full config table, update example TOML block and See Also links - docs/prediction-model.md: new comprehensive guide covering data collection, hour-of-day histogram building, scoring algorithm, confidence scaling, pruning mechanics, configuration tuning, and debug log reference - mkdocs.yml + docs/index.md: add navigation links to prediction model doc Manual QA verified with RUST_LOG=debug dry-run showing all three log types:

…nterval, fix flaky date test - Fix stale inline comment in docs/configuration.md example TOML (update_interval description now matches actual behavior) - Auto-enforce prediction.update_interval >= root update_interval via std::cmp::max; emit warn! when correction is applied so operators notice misconfiguration - Rename debug log field 'samples=N' to 'accumulated_ticks=N' for clarity in model.rs flush logging - Add two multi-tick averaging tests: arithmetic mean verification across flush boundaries and GPU per-slot averaging with varying GPU counts, both with descriptive comments explaining expected values and flush timing - Fix flaky test_history_entry_date_extraction to use Utc::now() instead of Local::now(), matching entry_date()'s UTC implementation - Update AGENTS.md comment policy under Core Principles

…s rename

…nt-config and inhibition fallback - Replace hardcoded CPU/network/disk thresholds in prediction model with single 'inhibited' boolean from service.rs threshold logic. This removes three unnecessary config fields (cpu_high_threshold, network_high_threshold, disk_high_threshold) and uses the actual inhibition state computed per-tick. - Fix --print-config: was ignoring -c flag and always using merged defaults. Now respects single config file path when provided. - Fix inhibition fallback: rewrite InhibitionState::acquire() to use a clean retry pattern via SleepInhibitor::acquire_with_fallback(). Removes buggy code that made two redundant D-Bus calls on auth error (creating duplicate inhibitors). - Upgrade TimeKey from single hour-of-day dimension to three dimensions: year, week_of_year, seconds_into_week for seasonal/monthly/weekday patterns. - Fix clippy warnings: redundant closure, unnecessary cast, clone-on-copy, manual RangeInclusive::contains (4 errors total). - Update prediction-model.md documentation to reflect TimeKey representation and simplified inhibition-based scoring.

…ic consolidation

…omment accuracy

… fallback to auth-only errors - Fix critical bug: score_inhibition_rate() ±3600s proximity search now constrains by year and ±1 week of year to prevent historical data from last year contaminating current predictions. - Narrow inhibition D-Bus fallback: only falls back on auth-related errors (interactive authentication, Access denied), not all failures. Non-auth errors propagate unchanged without masking real infrastructure issues. - Remove dead code: hour_component() and day_of_week() methods from TimeKey were never called anywhere in the codebase. - Add 5 new unit tests for TimeKey struct and prediction scoring path.

…auth error patterns Add linear_day() helper for correct end-of-year boundary handling in score_inhibition_rate(). Expand is_auth_error() to catch additional polkit error strings ("not authorized", "not authenticated").

…ek to f64 Remove 'Running history pruning' debug line — prune() already logs at info level when files are actually removed. Remove 'Metrics exceed threshold, checking inhibition status' debug line — state transitions are logged at INFO level ('Sleep inhibited:', 'Releasing sleep'). Change TimeKey.seconds_into_week from i64 to f64 for millisecond precision (0–604799.999s). Implement Eq + Hash manually via bit-level equality since f64 doesn't derive these traits; deterministic integer arithmetic ensures exact equality for HashMap key compatibility.

…d-only home Replace RuntimeDirectory (tmpfs, lost on reboot) with StateDirectory=rouser-data to provide a persistent writable directory at /var/lib/rouser-data. Set XDG_DATA_HOME=/var/lib/rouser-data so the history log writes there when running as systemd service with ProtectHome=read-only — /var/lib is outside /home and survives reboots.

Bug #1: 'Predictive cooldown extension' info log fired on every tick while extended cooldown was active because predicted_additional_time was already set from a previous tick. Added check for predicted_additional_time.is_zero() so the message only logs once per transition into below-threshold state, matching how 'Sleep inhibited' logs only fire on state transitions. Bug #2: Predictive cooldown extension had no effect — inhibition was released after base cooldown_duration (10s) instead of respecting the predicted +1028s extension. The release logic checked plain cooldown_duration first and released before reaching the predictive branch. Replaced two-branch logic with single path using std::cmp::max(cooldown_duration, predicted_additional_time) so the prediction always extends (not replaces) the base cooldown period.

…erpolation Add backward-compatible rate-of-change (delta) fields to HistoryEntry: - elapsed_since_last_ns, cpu_delta_per_sec, network/disk/gpu deltas per sec - compute_deltas() method for computing consecutive entry differences - XDG_STATE_HOME migration with /tmp fallback using PID-based unique path and 0700 permissions to minimize TOCTOU risk on shared systems Add gap detection (fill_gaps) that inserts synthetic zero-value entries when computer is shut down or sleeping, preventing prediction model overfitting on active-period data only. Uses GAP_THRESHOLD_NS=5min / FILL_INTERVAL_NS=30s. Ensure sorted file reading by date ascending with monotonic timestamp ordering via BTreeMap iteration + sort_by_key after loading all files. Improve service.rs cooldown_extension_applied flag to prevent redundant prediction queries and add base+extension breakdown in release logging. Update documentation for XDG_STATE_HOME, prediction model, systemd service. Add AGENTS.md note about state directory migration breaking change.

…nd signals Delta fields were previously dead code — computed struct fields existed but the prediction model never consumed them. This fix: 1. Tracks last flushed entry metrics to enable actual delta computation 2. Calls HistoryEntry::compute_deltas() when flushing snapshots (not just tests) 3. Adds TrendSignal scoring that normalizes CPU/network rate-of-change into a 0.5-1.4x multiplier on the base inhibition score for trend-aware predictions 4. Updates prediction-model.md with documentation for delta features, gap handling, and trend-aware scoring sections 5. Fixes test to use is_root=false for portable XDG_STATE_HOME writes in tests Regression tests verify deltas are computed in production flush path.

…lenames Previously files without valid YYYYMMDD dates were silently skipped with a warning. Now they are read and grouped by their filesystem modification timestamp as sort key, ensuring no history data is lost from old-format or corrupted backup files in the history directory.

…allback On Linux, std::fs provides no safe way to access file birth/creation times without unsafe syscalls. Since AGENTS.md prohibits introducing unsafe code without explicit instruction, modification time is used as the best available proxy — historical log files are typically not modified after initial writes.

…tic records Real history entries pushed after fill_gaps() retained stale delta values referencing their original predecessor. Now compute_deltas() is called against the actual predecessor in the filled sequence.

…ecomputation test TrendSignal::compute() now divides network delta sum by only the count of entries with valid network deltas (net_samples), matching how CPU averages are computed. Previously divided by total entry count n, which diluted the average when some entries had None network deltas. Add integration test verifying that real entries after gap-filled synthetic records have their deltas correctly recomputed against zero-value predecessors.

Replace last_elapsed >= 1_000_000_000 && last_elapsed <= FILL_INTERVAL_NS with (1_000_000_000..=FILL_INTERVAL_NS).contains(&last_elapsed).

- Work in branches only (commits to main forbidden without explicit instruction) - Remove --config flag from ExecStart since ConfigLoader::load_merged() handles auto-discovery of /etc/rouser/config.toml + ~/.config/rouser/config.toml

The systemd service was updated to drop the --config flag since ConfigLoader::load_merged() handles auto-discovery. Update all four ExecStart example references in this doc to match.

…iority chain Phase 1 initializes tracing at DEBUG (or explicit RUST_LOG/CLI override) so auto-install logs during config load are captured. Phase 2 reconfigures the log level using resolve_tracing_log_level() which follows the exact priority chain: CLI -l flag > RUST_LOG env var > config.log_level > 'info'. Uses tracing_subscriber::reload::Layer for runtime filter swapping via .modify() instead of requiring a fresh subscriber install. This avoids panics when another global subscriber already exists (e.g., from PAM).

…ed trend window Remove 5 delta fields (elapsed_since_last_ns, cpu_delta_per_sec, network_delta_per_sec, disk_delta_per_sec, gpu_deltas_per_sec) from HistoryEntry serialization. Compute deltas on-the-fly at prediction time using a standalone EntryDeltas::compute() method that takes consecutive entries and calculates per-second rates. Remove hard-coded GAP_THRESHOLD_NS (5min) and FILL_INTERVAL_NS (30s) constants. Make fill_gaps() a public configurable function using the [prediction] update_interval config value for both threshold and interval. Synthetic zero-value entries are now in-memory only — added at prediction time, never flushed to disk. Replace '20 most recent entries' hard-coded count with timestamp-based window: all entries where timestamp >= current_time - max_extension_time. This ensures consistent temporal coverage regardless of tick frequency.

…ehavior Update docs/prediction-model.md: replace hard-coded '>5 minutes' and '30-second intervals' with references to [prediction].update_interval config. Remove delta fields storage table — deltas are now computed on-the-fly at prediction time, not stored in history files. Replace '20 most recent entries' description with timestamp-based window using max_extension_time. Clarify that synthetic gap-filled entries exist only in memory during prediction.

…iting period Previously predict_cooldown() ran only once per inhibited-to-below-threshold transition, then the computed extension was static for the entire remaining cooldown. Now it is re-evaluated on every tick while metrics stay below threshold, allowing the extension to increase or decrease based on current trends (minimum 0 via Duration::ZERO). Changes: - Added spike guard: skip re-evaluation when should_inhibit is true - Moved predict_cooldown() into the below-threshold waiting block for per-tick re-evaluation during active cooldown - Removed !cooldown_extension_applied guard from transition logic - Info log on first non-zero extension, debug log on subsequent changes

…d unreachable spike guard Oracle review identified: - cooldown_extension_applied was written 3 times but never read — dead code from the old per-transition guard that was replaced by tick-based re-evaluation - Spike guard (if should_inhibit { return }) inside the below-threshold block could never trigger since metrics_below_threshold_since implies not inhibiting Removes: struct field, constructor init, spike guard, all assignments. No behavioral change — purely dead code cleanup.

owaindjones · 2026-05-04T21:30:03Z

Current issues

It loads the history files every time it (re)calculates the cooldown; whilst in "cooldown" state this means it's loading all history from file on every tick -- it should only load from file once at startup, in order to train the prediction model. Updating the prediction model with data during runtime should not require it to load all history from scratch every time - some form of online training should be used to update the model iteratively in-memory when each snapshot is logged, so that it only needs to read from history files at startup.
- Important note: Gaps in data need to be filled on the fly when updating the prediction model; gaps should be detected at runtime - when snapshots are logged / model is updated, the gaps in the input data should be filled with synthetic data at that point (and remembering to not write the synthetic data to disk).
Predicted cooldown is always given as the very specific value +1028.571428571s and I have not seen this value change at all in the journalctl logs for the latest commit, which makes me suspect something is fixed in the calculation; it may be taking the [prediction] max_extension_time and not doing anything with the actual prediction?
It does not appear as though the predicted cooldown actually affects when inhibition is released (as in, the extended cooldown value is not applied) as we still see this in logs: Releasing sleep inhibition: all metrics below threshold for 10.04917252s and systemd-inhibit confirms inhibition is dropped much sooner than 1028 seconds/30 minutes.

owaindjones · 2026-05-07T07:51:15Z

Feature request: Calculate overall system average and max GPU usage

The CPU usage is calculated per-core (and frequency-weighted per-core), but the final metrics used in the inhibition decision are: aggregate average CPU usage (total usage divided by number of cores), and maximum individual CPU core usage.

Rouser should apply the same to the GPU usage:

Calculate usage independently for each GPU, including vendor-specific frequency-weighting and usage calculations, as is happening now
But for the final metrics, instead of using the individual GPU usages, calculate these two metrics:
- Aggregate average GPU usage (sum of GPU usage of all GPUs, divided by number of GPUs)
- Maximum individual GPU usage

The config file should be refactored to look like this:

[metrics.gpu]
per_gpu_threshold = 33.3      # GPU usage threshold (percentage)
total_threshold = 50.0
ema_alpha = 0.7       # EMA smoothing factor

^ In that scenario: Any individual GPU can trigger inhibition if they report usage over 33.3%, and inhibition can be triggered if the overall GPU usage on the system is above 50%. For a system with two GPUs, this means one can be 100% busy and the other idle (0%), or both hovering around 50% usage.

The inhibition decision code, history file format, and prediction model should be refactored to replace where they use the individual GPU usage metrics with the two new metrics: total (average) system GPU usage and maximum per-GPU usage.

Benefit: Should GPUs be added or removed from the system, the history file structure is preserved. Currently, adding or removing a GPU would change the size of the entries and mean that previous entries could no longer be used to train the prediction model.

It is still helpful to enumerate individual GPUs and report their usage in the debug output as is happening now, so don't remove that. It serves as a helpful diagnostic to show how they each contribute to the final metrics.

Keep inhibited_timekeys in sync when records are flushed so predictions reflect current data instead of stale startup snapshot. Add an in-memory rolling window (recent_entries) for trend analysis during cooldown periods, eliminating costly disk reads on every predict_cooldown() call. Fix double-prediction bug where the transition block overwrote the fresh prediction computed inside the cooldown block with a potentially zero value from stale historical data.

…esholds Replace single gpu.threshold config with dual-threshold system: - per_gpu_threshold (default 15%): triggers inhibition if any single GPU exceeds it - total_threshold (default 15%): triggers inhibition if system-wide GPU average exceeds it - Both use OR logic — either threshold being exceeded inhibits sleep Key changes: - New GpuAggregate struct in metrics/gpu.rs with from_gpus/from_values constructors - Replace HistoryEntry.gpu_usages Vec<f64> with GpuSnapshot { per_gpu_max, total_average } for consistent history format regardless of GPU count - ThresholdManager::should_inhibit() takes &GpuAggregate instead of &[f64] - Updated config/rouser.toml: [metrics.gpu].threshold → per_gpu_threshold + total_threshold - Simplified EntryDeltas: removed gpu_deltas_per_sec vector field (aggregates suffice) - Added #[allow(clippy::too_many_arguments)] to HistoryEntry::new() (8 params, consistent pattern) 92 tests pass. 0 failed.

…l average Update [metrics.gpu] section to reflect new configuration structure: - Replace single threshold with per_gpu_threshold and total_threshold keys - Document OR logic for both thresholds - Update example config and best practices section

- gpu-usage-measurement.md: replace single threshold example with per_gpu_threshold + total_threshold config, document OR logic for sleep inhibition decisions - metrics-overview.md: expand Aggregation Strategy section to cover both per-device and system-wide average thresholds, explain GpuSnapshot history format independence from GPU count - scratch/007-fixes-and-aggregate-gpu-metrics.md: update outdated 'What's NOT Done' entry (docs/configuration.md already committed in 887f39f)

… fix all stale doc references Change defaults from 15/15 to more conservative values that reduce false-positive sleep inhibition during moderate multi-GPU workloads. Source-of-truth updates (AGENTS.md rule: always update config.toml first): - src/config.rs: default_gpu_threshold() → 25.0, default_gpu_total_threshold() → 40.0 - config/rouser.toml: per_gpu_threshold = 25.0, total_threshold = 40.0 Documentation fixes — replaced all stale single-threshold format with dual: - configuration.md: example + table defaults (15→25, 15→40) - gpu-usage-measurement.md: config example values - metrics-overview.md: Aggregation Strategy section expanded for dual thresholds - averaging.md: 6 GPU threshold examples across all configs + Per-GPU EMA text - developer-guide.md: code example uses GpuAggregate with both thresholds - installation.md: 3 GPU config blocks updated (default, workstation, gaming) - systemd-user-service.md: default service config GPU section Test assertion in src/config.rs test_defaults() also updated.

… in Default impls Remove all fn default_*() helper functions from config.rs since config/rouser.toml is the source of truth. Replace serde defaults with bare #[serde(default)] and hardcode values in explicit Default trait impls. Metrics struct now uses #[derive(Default)]. Update AGENTS.md Configuration Conventions to document this pattern.

Replace #[serde(default = "default_what")] with bare #[serde(default)] on InhibitionConfig.what field. The Default impl already provides the same value, making default_what() dead code. Also fix CONTRIBUTING.md and docs/developer-guide.md to document the new convention.

Fix three hardcoded Default trait impl values that didn't match config/rouser.toml: duration_threshold 30→5s, cooldown_duration 60→10s, exclude_device_prefixes empty→full list. Also update test_timing_defaults to assert correct TOML-matching values.

…xist

Add per_gpu_max and total_average to the main Metrics debug log line so operators can see the exact values used for inhibition decisions. Also adds 4 integration tests validating has_gpus() consistency with enumerate_gpus(), driver type recognition, and empty/valid card detection.

…s, model.rs

…PU edge cases Add 8 unit tests covering GpuAggregate::from_values() and from_gpus(): empty input returns defaults (0.0), single GPU yields identical max/average values, two+ GPUs compute correct max and mean, and from_gpus results match from_values for identical data.

Add per-GPU max and total average GPU usage to the 'Flushed averaged snapshot' debug message so operators can see whether GPUs contributed to a flush event without needing to parse per-device logs.

Include gpu_delta_per_gpu_max and gpu_delta_total_average in rate-of-change calculations. Update TrendSignal to average GPU trends alongside CPU, network, and disk for more complete trend-aware cooldown prediction.

Address all user corrections: GPU aggregate metrics in snapshots, gap-filled entries as valid idle states (not filtered), disk and GPU deltas included in trend calculations. Document new unsupervised NG-RC reservoir computing approach replacing histogram-based TimeKey matching.

…md updates - docs/prediction-todo.md: 19 task tracker with architecture decision record for NG-RC reservoir computing (irithyll crate), dependency analysis, effort estimates, and implementation notes per AGENTS.md constraints. - AGENTS.md: add Prediction Model Refactoring section referencing the TODO file, documenting TimeKey deprecation rationale, feature vectors, unsupervised learning approach, gap-filled entry handling, GPU deltas, and planned config fields.

Add ml_hidden_dim (default 16) and ml_delay_buffer_size (default 8) config options for the NG-RC reservoir computing model. Update Cargo.toml with irithyll v9.9 dependency using serde-bincode feature flag. Sync defaults across Cargo.toml, config/rouser.toml, src/config.rs, docs/configuration.md, and tests.

…pipeline Introduce src/prediction/ml_model.rs containing FeatureVector, NormalizationStats, MlPredictor structs. Implements unsupervised streaming learning via irithyll's NG-RC reservoir computing architecture. Includes Welford's online algorithm for running statistics, checkpoint persistence, and comprehensive test coverage.

owaindjones · 2026-05-07T15:18:25Z

YOLO

owaindjones added 13 commits May 1, 2026 20:13

chore: add .sisyphus to gitignore

393ebc3

docs(prediction): update log format example to match accumulated_tick…

bfd4e54

…s rename

refactor(inhibit): remove dead is_auth_error function after retry log…

043a8b9

…ic consolidation

fix(prediction): update debug log to show full TimeKey, fix inhibit c…

7782e8e

…omment accuracy

fmt: reformat code to match rustfmt conventions

2da23ab

owaindjones force-pushed the feat/predictive-cooldown branch from 220a4d8 to e5be9a2 Compare May 2, 2026 08:32

owaindjones added 16 commits May 2, 2026 10:09

fix(prediction): fix clippy manual_range_contains lint in test

606fbb6

Replace last_elapsed >= 1_000_000_000 && last_elapsed <= FILL_INTERVAL_NS with (1_000_000_000..=FILL_INTERVAL_NS).contains(&last_elapsed).

docs(systemd): remove stale --config paths from ExecStart examples

0623176

The systemd service was updated to drop the --config flag since ConfigLoader::load_merged() handles auto-discovery. Update all four ExecStart example references in this doc to match.

owaindjones added 23 commits May 7, 2026 09:12

docs(agents): fix stale reference to removed default helper functions

0186493

fmt: fix derive macro formatting on Metrics struct

b6bd2de

fmt: format vec! macro for exclude_device_prefixes

1eaf63e

fix(gpu): invert has_gpus() logic — was returning true when NO GPUs e…

4d7d7fb

…xist

fmt: fix trailing whitespace and blank line formatting in gpu.rs test…

1d20ad9

…s, model.rs

feat(prediction): include GPU aggregate metrics in snapshot debug log

7f15241

Add per-GPU max and total average GPU usage to the 'Flushed averaged snapshot' debug message so operators can see whether GPUs contributed to a flush event without needing to parse per-device logs.

fmt: fix indentation in model.rs snapshot log formatting

b9e9c66

feat(prediction): add GPU deltas to EntryDeltas and TrendSignal

87cdc2b

Include gpu_delta_per_gpu_max and gpu_delta_total_average in rate-of-change calculations. Update TrendSignal to average GPU trends alongside CPU, network, and disk for more complete trend-aware cooldown prediction.

fix: correct typo in AGENTS.md predotion→prediction reference

64a71ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prediction): add predictive cooldown with historical usage patterns#19

feat(prediction): add predictive cooldown with historical usage patterns#19
owaindjones wants to merge 52 commits into
mainfrom
feat/predictive-cooldown

owaindjones commented May 1, 2026

Uh oh!

owaindjones commented May 4, 2026 •

edited

Loading

Uh oh!

owaindjones commented May 7, 2026

Uh oh!

owaindjones commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

owaindjones commented May 1, 2026

Summary

Changes

Testing

Config Example

Manual QA Notes

Uh oh!

owaindjones commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current issues

Uh oh!

owaindjones commented May 7, 2026

Feature request: Calculate overall system average and max GPU usage

Uh oh!

owaindjones commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

owaindjones commented May 4, 2026 •

edited

Loading