Skip to content

Unlock model-driven question discovery in review and design skills#2

Open
alexei-led wants to merge 2 commits intovladikk:mainfrom
alexei-led:improve-questions-asking
Open

Unlock model-driven question discovery in review and design skills#2
alexei-led wants to merge 2 commits intovladikk:mainfrom
alexei-led:improve-questions-asking

Conversation

@alexei-led
Copy link
Copy Markdown
Contributor

@alexei-led alexei-led commented Apr 1, 2026

Question Flexibility

This PR changes how the review and high-level-design skills gather information from users. The short version: the model now reads the code first and figures out what to ask, instead of following a fixed questionnaire.

Why this matters

Your Balanced Coupling model is powerful — it needs three dimensions (strength, distance, volatility) to assess coupling properly. The model already has this framework loaded. But the current skills don't trust it to use that knowledge. Instead, they follow a rigid script:

  1. Ask "Domain?" (fixed free-text prompt)
  2. Ask "Teams?" (3 hard-coded options: Same team / Multiple teams / Mixed)
  3. Ask "Pain points?" (3 hard-coded options: Yes / No / Not sure)

These questions fire every time regardless of what the code or requirements already reveal. The model is perfectly capable of reading the code, applying the Balanced Coupling lens, and discovering what information it actually needs — we just weren't letting it.

What changed

Review skill — Step 1 rewritten

Before: Read requirements → Ask Scope → Read code → Ask Domain (fixed) → Ask Teams (fixed) → Ask Pain points (fixed)

After: Ask Scope → Read code + requirements → Surface understanding (model presents what it learned, user validates/corrects) → Discover gaps (model identifies what's missing for coupling assessment, asks targeted questions)

The model now:

  • Reads code before asking domain questions (was backwards before)
  • Presents its understanding with confidence levels — low confidence areas become natural targets for follow-up questions
  • Asks only questions whose answers would materially change the coupling assessment (explicit filter)
  • Gets seed guidance on common gaps (domain classification, team ownership, pain points, strategic direction, surprising code patterns) but is explicitly told "you are not limited to these categories"
  • Grounds questions in specific code observations — "I noticed X and Y share 12 types directly, is this intentional?" instead of generic "any pain points?"

High-level-design skill — Step 1 point 2 enhanced

Before: "List every ambiguity. Ask about each one." (generic, not coupling-aware)

After: Coupling-aware gap discovery — the model thinks about what the Balanced Coupling model needs (domain classification → volatility, organizational structure → distance, integration patterns → strength) and asks about gaps in those specific areas. Same materiality test: don't ask questions that wouldn't change the design.

The reasoning behind it

I did some research into how Opus 4.6 actually works (read the system card). A few things stood out:

  • Calibration: Opus 4.6 has state-of-the-art calibration — it knows when it doesn't know things (96.8% on false premise rejection, highest net scores on factual honesty). When we ask it to report confidence levels, it genuinely can identify where it's uncertain. That's exactly what we want for gap discovery.

  • "General instructions over prescriptive steps": Anthropic's own SWE-bench testing showed that telling the model to "explore the codebase and understand the root cause" produced better results than hand-written step-by-step plans. Our fixed questionnaire was a step-by-step plan. The new approach gives intent and constraints, then gets out of the way.

  • Agentic search: Opus 4.6 is state-of-the-art at multi-step information seeking (84% BrowseComp, 91.3% DeepSearchQA). It's naturally good at the "read context → identify gaps → decide what to explore next" pattern. We were under-utilizing this.

What stays the same

  • Scope question — still fixed (the model needs a bootstrap to know what to analyze)
  • Interaction rules — one question at a time, multiple choice preferred, headers
  • All approval gates in HLD — these are validation checkpoints, not information gathering
  • Steps 2-4 in review — Map Integrations, Apply Balance Rule, Write Review (unchanged)
  • Everything in balanced-coupling and document skills — untouched

Example of the difference

Before:

Model: "Domain?" → User types classification
Model: "Teams?" → User picks "Multiple teams"  
Model: "Pain points?" → User picks "Yes"
Model: [NOW reads code]

After:

Model: [reads code deeply]
Model: "Here's what I found — 4 components, OrderService and PaymentGateway 
        share 12 domain types directly, I think OrderService is core (high 
        confidence — complex business rules), UserAuth looks generic (high 
        confidence — standard OAuth). I'm uncertain whether NotificationHub 
        is supporting or generic. Does this look right?"
User: "PaymentGateway is also core — we're building our own payment processing"
Model: "That changes things. The 12 shared types between two core components 
        is significant. Are they owned by the same team?"
User: "Different teams since last quarter"
Model: [done — has everything needed for the analysis]

2 targeted questions instead of 4 scripted ones, better information because they're grounded in actual code.

Replace rigid hard-coded questionnaires with dynamic gap discovery.
The model reads code first, surfaces its understanding, then asks
only questions whose answers would materially change the analysis.
Copilot AI review requested due to automatic review settings April 1, 2026 10:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the review and high-level-design skills to shift Step 1 from a fixed questionnaire toward model-driven gap discovery: the model reads requirements/code first, synthesizes its understanding with confidence, then asks only targeted follow-up questions that materially affect Balanced Coupling decisions.

Changes:

  • Review skill Step 1 now: ask scope → read requirements + code → synthesize understanding + validate → ask targeted gap questions.
  • High-level-design skill Step 1.2 now frames ambiguity discovery specifically around Balanced Coupling inputs (volatility/distance/strength) and applies a “materiality” filter to questions.
  • Both skills add guidance to ground questions in concrete observations (code or requirements) rather than generic prompts.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
skills/review/SKILL.md Replaces the fixed domain/teams/pain-points questionnaire with synthesis-first + targeted gap discovery.
skills/high-level-design/SKILL.md Replaces generic ambiguity listing with coupling-aware gap discovery instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants