Skip to content

Fix: Eval hash mismatch due to parameter truncation in DB storage#1523

Open
rlundeen2 wants to merge 4 commits intoAzure:mainfrom
rlundeen2:users/rlundeen/2026_03_19_eval_hash_bug
Open

Fix: Eval hash mismatch due to parameter truncation in DB storage#1523
rlundeen2 wants to merge 4 commits intoAzure:mainfrom
rlundeen2:users/rlundeen/2026_03_19_eval_hash_bug

Conversation

@rlundeen2
Copy link
Contributor

@rlundeen2 rlundeen2 commented Mar 19, 2026

Bug: Running await printer.print_summary_async(scenario_result) in 1_configuring_scenarios.ipynb prints "official evaluation has not been run yet for this specific configuration" — even when evals have been run.

Root cause: Long scorer params (e.g., system prompt templates) are truncated to 80 characters when stored in the DB via ComponentIdentifier.to_dict(max_value_length=80). The identity .hash is correctly preserved through the round-trip, but eval_hash is recomputed from the truncated params by EvaluationIdentifier, producing a different hash than what was stored during the eval run. This causes the metrics lookup to fail silently.

Fix: Store eval_hash inside the ComponentIdentifier serialization (to_dict/from_dict) so it survives DB round-trips without recomputation from truncated params.

  • ComponentIdentifier: Added stored_eval_hash field and KEY_EVAL_HASH. to_dict(eval_hash=...) includes it in the JSON; from_dict() restores it.
  • EvaluationIdentifier: Uses stored_eval_hash when available instead of recomputing from (potentially truncated) params.
  • ScenarioResultEntry/ScoreEntry/AttackResultEntry: Compute eval_hash from untruncated identifiers before truncation and pass to to_dict().
  • atomic_attack.py: Same fix for the enriched identifier persistence path.

No DB schema migration needed — eval_hash is stored inside the existing JSON columns. Old data without it falls back to recomputation (same as prior behavior).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you re-run the configuring scenarios notebook?

Copy link
Contributor Author

@rlundeen2 rlundeen2 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another bug because scenarios aren't setting the underlying model so it's defaulting to "gpt-4o" and the hash is different. I want scenarios to grab this from the registry so there isn't a mismatch. But for now the notebook doesn't update. I'd like to tackle with a future PR

Store eval_hash inside ComponentIdentifier serialization (to_dict/from_dict)
so it survives DB round-trips without recomputation from truncated params.

- ComponentIdentifier: added stored_eval_hash field and KEY_EVAL_HASH
- EvaluationIdentifier: uses stored_eval_hash when available
- ScenarioResultEntry/ScoreEntry/AttackResultEntry: compute eval_hash before truncation
- atomic_attack.py: same fix for enriched identifier persistence
- Tests: round-trip, double round-trip, and regression tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 force-pushed the users/rlundeen/2026_03_19_eval_hash_bug branch from 409d1b7 to 7084a21 Compare March 20, 2026 01:17
#: Evaluation hash preserved from DB round-trip. Computed before truncation and
#: stored alongside the identity so that EvaluationIdentifier can use it directly
#: instead of recomputing from potentially truncated params.
stored_eval_hash: Optional[str] = field(default=None, init=False, compare=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to rename to eval_hash

ComponentIdentifier: The identity with ``stored_eval_hash`` set.
"""
identifier = super().get_identifier()
if identifier.stored_eval_hash is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this better

object.__setattr__(identifier, "stored_eval_hash", eval_hash)
return identifier

def get_eval_hash(self) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of get_eval_hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants