Skip to content

[EAGLE] Add Tensorboard logging support#1070

Open
benchislett wants to merge 3 commits intomainfrom
bchislett/eagle-tensorboard-logs
Open

[EAGLE] Add Tensorboard logging support#1070
benchislett wants to merge 3 commits intomainfrom
bchislett/eagle-tensorboard-logs

Conversation

@benchislett
Copy link
Contributor

@benchislett benchislett commented Mar 19, 2026

What does this PR do?

Type of change: Adds support for Tensorboard as a logging backend.

Also updates logging to support eval mode (which will be re-enabled in a separate PR)

Usage

Add --tensorboard True to launch_train.sh

Testing

I ran it to make sure the logs are reasonable. Looks fine to me

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ❌
  • Did you update Changelog?: N/A

Summary by CodeRabbit

Release Notes

  • New Features

    • Added TensorBoard support for training log visualization
    • Enhanced metrics tracking with improved accuracy reporting
    • Enabled automatic AR estimation by default
  • Documentation

    • Added observability guide documenting Weights & Biases and TensorBoard logging options

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett requested a review from a team as a code owner March 19, 2026 04:42
@benchislett benchislett requested a review from yeyu-nvidia March 19, 2026 04:42
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

These changes add TensorBoard observability support to the speculative decoding example, complementing existing Weights & Biases integration. The training pipeline conditionally initializes TensorBoard logging through SummaryWriter, refactors metric reporting with mode-qualified naming, and updates shell script configuration to enable TensorBoard via CLI flags.

Changes

Cohort / File(s) Summary
Observability Documentation & Configuration
examples/speculative_decoding/README.md, examples/speculative_decoding/launch_train.sh
Added "Observability" section documenting automatic Weights & Biases support and conditional TensorBoard usage; updated shell script to accept --tensorboard flag, set ENABLE_TENSORBOARD, and append observability args to training command; changed default ESTIMATE_AR to "True".
TensorBoard Integration & Metric Refactoring
examples/speculative_decoding/eagle_utils.py, examples/speculative_decoding/main.py
Added TensorBoard SummaryWriter support with scalar logging for parallel per-step accuracies and estimated AR; refactored EagleTrainingPlot to use internal _report_stats helper with separate on_log and on_evaluate handlers; updated metric naming to be mode-qualified; modified compute_loss to conditionally return outputs; updated training script to dynamically construct callbacks list and initialize TensorBoard when requested.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer
    participant EagleTrainingPlot as EagleTrainingPlot<br/>(Callback)
    participant TensorBoard as TensorBoard<br/>SummaryWriter
    participant WandB as Weights & Biases

    Trainer->>EagleTrainingPlot: on_log (training step)
    EagleTrainingPlot->>EagleTrainingPlot: _report_stats(state, eval_mode=False)
    EagleTrainingPlot->>TensorBoard: add_scalar(parallel_i_step_j_train_acc)
    EagleTrainingPlot->>WandB: log parallel_i_step_j_train_acc
    
    Trainer->>EagleTrainingPlot: on_evaluate (evaluation)
    EagleTrainingPlot->>EagleTrainingPlot: _report_stats(state, eval_mode=True)
    EagleTrainingPlot->>TensorBoard: add_scalar(estimated_eval_ar)
    EagleTrainingPlot->>WandB: log estimated_eval_ar
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title '[EAGLE] Add Tensorboard logging support' accurately summarizes the main change: adding TensorBoard as a logging backend for the EAGLE example.
Security Anti-Patterns ✅ Passed No security anti-patterns from SECURITY.md detected. No unsafe torch.load(), numpy.load(), trust_remote_code, eval/exec, nosec bypasses, or new external dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bchislett/eagle-tensorboard-logs
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/speculative_decoding/eagle_utils.py (1)

74-74: ⚠️ Potential issue | 🟠 Major

Security: torch.load without weights_only=True on potentially user-supplied files.

This line loads .pt files from offline_data_path, which could be user-supplied. Per coding guidelines, torch.load should use weights_only=True to prevent arbitrary code execution from malicious checkpoints. If weights_only=False is required, an inline comment must justify why it is safe.

Since this is pre-existing code and not part of this PR's changes, consider addressing it in a follow-up, but it warrants attention.

🛡️ Recommended fix
-        offline_data = torch.load(self.dumped_files[i])
+        offline_data = torch.load(self.dumped_files[i], weights_only=True)

If the offline data files contain custom objects that require weights_only=False, add an inline comment explaining why this is safe (e.g., files are internally-generated by trusted scripts).

As per coding guidelines: "Do not use torch.load(..., weights_only=False) unless a documented exception is provided. Always use weights_only=True to prevent arbitrary code execution from malicious checkpoints."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/eagle_utils.py` at line 74, The call to
torch.load in the line that assigns offline_data =
torch.load(self.dumped_files[i]) is unsafe for potentially user-supplied files;
change it to torch.load(self.dumped_files[i], weights_only=True) to avoid
arbitrary code execution, or if the code genuinely needs non-weight objects, add
an inline justification comment next to this call explaining why
weights_only=False is safe (e.g., files are generated by a trusted internal
process), and keep the rest of the logic that uses offline_data and
self.dumped_files[i] unchanged.
🧹 Nitpick comments (2)
examples/speculative_decoding/launch_train.sh (1)

223-227: Consider using a more explicit boolean check.

The condition != "False" means any value other than literal "False" enables TensorBoard (e.g., --tensorboard=True, --tensorboard=1, or even typos like --tensorboard=Flase). This is consistent with other flags in the script, but a stricter check could prevent accidental enablement.

♻️ Optional: Stricter boolean check
-if [[ "$ENABLE_TENSORBOARD" != "False" ]]; then
+if [[ "$ENABLE_TENSORBOARD" == "True" ]]; then
   OBSERVABILITY_ARGS="--report_to tensorboard"
 else
   OBSERVABILITY_ARGS=""
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` around lines 223 - 227, The
current conditional uses ENABLE_TENSORBOARD != "False" which treats any
non-"False" value as true; change it to perform an explicit boolean check on
ENABLE_TENSORBOARD (e.g., test for "True" or "1", or a case-insensitive match
like "true"/"1") before setting OBSERVABILITY_ARGS, so only intended truthy
values enable "--report_to tensorboard"; update the conditional that sets
OBSERVABILITY_ARGS (the ENABLE_TENSORBOARD variable and OBSERVABILITY_ARGS
assignment) to use the stricter check.
examples/speculative_decoding/eagle_utils.py (1)

239-249: Address or remove the TODO comment.

The TODO at line 240 asks "What are in 'kwargs.logs'?" - this should either be investigated and documented, or removed if it's no longer relevant. Leaving unresolved TODOs can cause confusion for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/eagle_utils.py` around lines 239 - 249, Remove
or replace the TODO asking "What are in 'kwargs.logs'?" by either deleting it if
obsolete or adding a concise explanatory comment stating that kwargs.logs is the
source of the per-draft/per-step metrics used to compute average_acc (i.e., the
logged metrics aggregated into average_acc), and note any expected
keys/structure; update the nearby comment above the self.tb_writer block (which
writes average_acc and est_ar using mode_id and state.global_step) and, if
applicable, add a short note in the function/class docstring that describes
kwargs.logs structure so future maintainers understand how average_acc and
est_ar are derived.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@examples/speculative_decoding/eagle_utils.py`:
- Line 74: The call to torch.load in the line that assigns offline_data =
torch.load(self.dumped_files[i]) is unsafe for potentially user-supplied files;
change it to torch.load(self.dumped_files[i], weights_only=True) to avoid
arbitrary code execution, or if the code genuinely needs non-weight objects, add
an inline justification comment next to this call explaining why
weights_only=False is safe (e.g., files are generated by a trusted internal
process), and keep the rest of the logic that uses offline_data and
self.dumped_files[i] unchanged.

---

Nitpick comments:
In `@examples/speculative_decoding/eagle_utils.py`:
- Around line 239-249: Remove or replace the TODO asking "What are in
'kwargs.logs'?" by either deleting it if obsolete or adding a concise
explanatory comment stating that kwargs.logs is the source of the
per-draft/per-step metrics used to compute average_acc (i.e., the logged metrics
aggregated into average_acc), and note any expected keys/structure; update the
nearby comment above the self.tb_writer block (which writes average_acc and
est_ar using mode_id and state.global_step) and, if applicable, add a short note
in the function/class docstring that describes kwargs.logs structure so future
maintainers understand how average_acc and est_ar are derived.

In `@examples/speculative_decoding/launch_train.sh`:
- Around line 223-227: The current conditional uses ENABLE_TENSORBOARD !=
"False" which treats any non-"False" value as true; change it to perform an
explicit boolean check on ENABLE_TENSORBOARD (e.g., test for "True" or "1", or a
case-insensitive match like "true"/"1") before setting OBSERVABILITY_ARGS, so
only intended truthy values enable "--report_to tensorboard"; update the
conditional that sets OBSERVABILITY_ARGS (the ENABLE_TENSORBOARD variable and
OBSERVABILITY_ARGS assignment) to use the stricter check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37c34d1f-fb87-4239-869c-e70befcdb60e

📥 Commits

Reviewing files that changed from the base of the PR and between 839fa3d and 8a442e4.

📒 Files selected for processing (4)
  • examples/speculative_decoding/README.md
  • examples/speculative_decoding/eagle_utils.py
  • examples/speculative_decoding/launch_train.sh
  • examples/speculative_decoding/main.py

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.30%. Comparing base (839fa3d) to head (8a442e4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1070   +/-   ##
=======================================
  Coverage   70.30%   70.30%           
=======================================
  Files         227      227           
  Lines       25857    25857           
=======================================
  Hits        18179    18179           
  Misses       7678     7678           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant