Skip to content

docs(experiment): qualified-name preamble — recorded after revert#32

Open
dvcdsys wants to merge 1 commit intomainfrom
docs/qualified-name-preamble-experiment
Open

docs(experiment): qualified-name preamble — recorded after revert#32
dvcdsys wants to merge 1 commit intomainfrom
docs/qualified-name-preamble-experiment

Conversation

@dvcdsys
Copy link
Copy Markdown
Owner

@dvcdsys dvcdsys commented May 7, 2026

Summary

  • Captures the A/B/C testing of a docstring-wrapped preamble with qualified symbol names (UserService.authenticate) across two real codebases — Python class-heavy (brain-project) and Go-heavy (this repo) — plus controlled fixture experiments.
  • The naive QID benchmark showed +5.6%, but a semantic NL benchmark (queries describe behaviour, never name the class/method) showed essentially zero gain on Python, zero on Go, and one regression where a Mode A top-1 hit dropped below the relevance threshold in Mode B.
  • The +5.6% turned out to be a literal-string-match artefact of Class.method appearing verbatim in the new preamble. Body content already carries enough lexical signal (self.X, type hints, imports, SQL table names) for the embedder to disambiguate class-scoped methods.
  • Feature was reverted in the same session; this doc is the record so future iterations don't re-litigate the same hypothesis without the right test.

What's in the diff

A single new file: doc/qualified-name-preamble-experiment.md (~210 lines). Covers:

  • TL;DR + decision
  • Implementation scope summary (~20 files touched, all reverted)
  • Test methodology — controlled fixtures, QID battery, semantic NL battery (the decisive one)
  • Per-codebase results with hit-rate, rank-1 hit-rate, avg expected score, top-K shifts
  • Disambiguation control (EventMemory.search_embeddings vs SemanticMemory.search_embeddings — margin actually shrank in Mode B)
  • Conclusion + reproducing instructions

Test plan

  • Read doc/qualified-name-preamble-experiment.md end-to-end
  • Verify the conclusion is consistent with team's calibration of when to ship vs. revert experiments
  • Decide whether to fold this into a broader "experiments/" subdir in doc/ or leave standalone

🤖 Generated with Claude Code

Captures the A/B/C testing of a docstring-wrapped preamble with
qualified symbol names (`UserService.authenticate`) across two real
codebases (Python class-heavy + Go-heavy) plus controlled fixtures.

Conclusion: the +5.6% QID benchmark gain was a literal-string-match
artefact of the new preamble; semantic NL queries that don't name the
class/method showed near-zero gain and one regression. Feature was
reverted in the same session — this doc is the record so future
iterations don't repeat the same hypothesis without the right test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant