day2/04: Helios RFP multi-agent build (Bounteous demo)#7
Open
rforsh wants to merge 4 commits into
Open
Conversation
5-agent team (Parser/Retriever/Drafter/Validator/Reviser) with bounded revision + post-revision recheck. Self-contained Bounteous-branded HTML chat/dashboard UI that calls the Anthropic API directly from the browser, plus a Jupyter notebook that mirrors the same pipeline in Python. Failure budget: 6 modes (malformed RFP / KB miss / API error / bad JSON / contradiction / missing key), each fails properly without blowing up. - helios_agent.html (56k) — 3-pane dashboard + bottom-spanning chat + Architecture tab w/ Mermaid diagrams; Haiku for Parser/Retriever, Sonnet for Drafter/Validator/Reviser; KB embedded as JS verbatim. - Agent_Engineering_Challenge.ipynb (34 cells) — Parts 0-4 unchanged; new Parts 5-11: agents, validator+reviser, 5 synthetic RFPs, safe-JSON helper, E2E run, HTML pointer, evals. - README.md — architecture diagram + failure-mode matrix. - serve.py — optional launcher; exposes /api/env-key so the UI lifts ANTHROPIC_API_KEY from env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add the Five Questions surprise RFP as sample F. Live-tested: 41s, 2 OK / 3 flagged / 0 failed. Q3 (Kubernetes — KB has zero coverage) correctly refused to fabricate (src=0, kb-gap). Q4 self-flagged support-access-exceptions-undocumented and telemetry-exceptions- undocumented. Q5 hedged on air-gapped deployment boundary. - Bottom chat is now always-on (was gated on run completion) and receives a RUN_SNAPSHOT JSON with current RFP, cards, trace tail, KB summary, validation, and final answers each turn. Multi-turn history kept (last 12 turns). - Draggable splitters between every pane (left/right/bottom), with layout persisted to localStorage. - Third tab "How to use" added between Architecture and Run. - Post-revision recheck + unresolved-contradiction flag wired in both HTML and notebook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each trace row is now a clickable disclosure. Expanding shows: full timestamp, role, target, level, latency, full outcome text, and when the event targets a known question, the linked card's current state, category, confidence, sources, flags, and the full draft text. Click again to collapse. Open rows get a violet left-border accent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drawer - Chat replies now stream token-by-token via Anthropic SSE (callAnthropicStream). Live markdown rendering updates as the stream arrives. - Tiny safe markdown renderer for chat output: headings, bold, italic, code, lists, paragraphs, hr. HTML-escapes source first. - Cost calculator: pricing table for sonnet/haiku models, per-call usage capture for both streaming (message_start + message_delta events) and non-streaming responses. New "Cost" scoreboard tile shows USD total + token counts, updates live. - Chat log now flex-grows with the bottom drawer height instead of capping at 240px. - Splitter handles widened from 6px to 10px with visible center grab indicators (vertical pill for col-resize, horizontal pill for row-resize). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Day 2 / Exercise 04 Agent Hackathon submission — a multi-agent RFP responder
for Helios Security, packaged as a self-contained Bounteous-branded HTML
chat/dashboard plus the rewritten Jupyter notebook.
Haiku/Sonnet model split and a bounded revision loop + post-revision
recheck. Failure budget covers all 6 stated failure modes.
agent trace (click-to-expand), draggable splitters with persisted
layout, always-on chat grounded on the live run snapshot, streaming
responses with markdown rendering, and a live cost calculator
(per-call USD + token counts).
lifecycle, failure budget) and a How-to-use tab explaining every
control.
5 synthetic RFP fixtures (table / list / prose / edge-case / adversarial)
41s — Q3 hallucination trap defeated (kb_gap, src=0, no fabrication),
Q4/Q5 properly hedged with kb_gap and drafter-side undocumented flags.
ANTHROPIC_API_KEYfrom env sothe UI auto-fills the key.
Test plan
(
run_parser,run_retriever,run_drafter,run_validator,run_reviser,safe_parse_json,process_rfp).works; no console errors.
3 flagged / 0 failed / 2 revisions, cost ≈ $0.04.
the live surprise-RFP run.
updating each turn.
detail.
🤖 Generated with Claude Code