day2/04: Helios RFP multi-agent build (Bounteous demo) by rforsh · Pull Request #7 · victorsteeb/Basecamp-Exercises

rforsh · 2026-05-20T22:51:14Z

Summary

Day 2 / Exercise 04 Agent Hackathon submission — a multi-agent RFP responder
for Helios Security, packaged as a self-contained Bounteous-branded HTML
chat/dashboard plus the rewritten Jupyter notebook.

5-agent team (Parser, Retriever, Drafter, Validator, Reviser) with a
Haiku/Sonnet model split and a bounded revision loop + post-revision
recheck. Failure budget covers all 6 stated failure modes.
Self-contained HTML with 3-pane Bounteous-branded dashboard, live
agent trace (click-to-expand), draggable splitters with persisted
layout, always-on chat grounded on the live run snapshot, streaming
responses with markdown rendering, and a live cost calculator
(per-call USD + token counts).
Architecture tab with 3 Mermaid diagrams (pipeline, per-Q
lifecycle, failure budget) and a How-to-use tab explaining every
control.
Notebook mirrors the same 5-agent pipeline in Python; ships with
5 synthetic RFP fixtures (table / list / prose / edge-case / adversarial)
- Part 11 eval harness.
Surprise RFP (Sample F · Five Questions) included; live-tested in
41s — Q3 hallucination trap defeated (kb_gap, src=0, no fabrication),
Q4/Q5 properly hedged with kb_gap and drafter-side undocumented flags.
serve.py optional launcher lifts ANTHROPIC_API_KEY from env so
the UI auto-fills the key.

Test plan

Notebook is valid JSON; all 7 required functions present
(run_parser, run_retriever, run_drafter, run_validator,
run_reviser, safe_parse_json, process_rfp).
HTML opens standalone; all 6 sample RFPs load; no-key fail-fast
works; no console errors.
Live end-to-end run against surprise RFP (Sample F): 41s, 2 OK /
3 flagged / 0 failed / 2 revisions, cost ≈ $0.04.
All 6 failure budget modes implemented and 4 of 6 exercised by
the live surprise-RFP run.
Streaming chat verified rendering markdown live with cost tile
updating each turn.
Click-to-expand on trace entries shows full event + linked card
detail.
All borders draggable, layout persists to localStorage.

🤖 Generated with Claude Code

5-agent team (Parser/Retriever/Drafter/Validator/Reviser) with bounded revision + post-revision recheck. Self-contained Bounteous-branded HTML chat/dashboard UI that calls the Anthropic API directly from the browser, plus a Jupyter notebook that mirrors the same pipeline in Python. Failure budget: 6 modes (malformed RFP / KB miss / API error / bad JSON / contradiction / missing key), each fails properly without blowing up. - helios_agent.html (56k) — 3-pane dashboard + bottom-spanning chat + Architecture tab w/ Mermaid diagrams; Haiku for Parser/Retriever, Sonnet for Drafter/Validator/Reviser; KB embedded as JS verbatim. - Agent_Engineering_Challenge.ipynb (34 cells) — Parts 0-4 unchanged; new Parts 5-11: agents, validator+reviser, 5 synthetic RFPs, safe-JSON helper, E2E run, HTML pointer, evals. - README.md — architecture diagram + failure-mode matrix. - serve.py — optional launcher; exposes /api/env-key so the UI lifts ANTHROPIC_API_KEY from env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Add the Five Questions surprise RFP as sample F. Live-tested: 41s, 2 OK / 3 flagged / 0 failed. Q3 (Kubernetes — KB has zero coverage) correctly refused to fabricate (src=0, kb-gap). Q4 self-flagged support-access-exceptions-undocumented and telemetry-exceptions- undocumented. Q5 hedged on air-gapped deployment boundary. - Bottom chat is now always-on (was gated on run completion) and receives a RUN_SNAPSHOT JSON with current RFP, cards, trace tail, KB summary, validation, and final answers each turn. Multi-turn history kept (last 12 turns). - Draggable splitters between every pane (left/right/bottom), with layout persisted to localStorage. - Third tab "How to use" added between Architecture and Run. - Post-revision recheck + unresolved-contradiction flag wired in both HTML and notebook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each trace row is now a clickable disclosure. Expanding shows: full timestamp, role, target, level, latency, full outcome text, and when the event targets a known question, the linked card's current state, category, confidence, sources, flags, and the full draft text. Click again to collapse. Open rows get a violet left-border accent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… drawer - Chat replies now stream token-by-token via Anthropic SSE (callAnthropicStream). Live markdown rendering updates as the stream arrives. - Tiny safe markdown renderer for chat output: headings, bold, italic, code, lists, paragraphs, hr. HTML-escapes source first. - Cost calculator: pricing table for sonnet/haiku models, per-call usage capture for both streaming (message_start + message_delta events) and non-streaming responses. New "Cost" scoreboard tile shows USD total + token counts, updates live. - Chat log now flex-grows with the bottom drawer height instead of capping at 240px. - Splitter handles widened from 6px to 10px with visible center grab indicators (vertical pill for col-resize, horizontal pill for row-resize). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rforsh and others added 4 commits May 20, 2026 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

day2/04: Helios RFP multi-agent build (Bounteous demo)#7

day2/04: Helios RFP multi-agent build (Bounteous demo)#7
rforsh wants to merge 4 commits into
victorsteeb:mainfrom
rforsh:rfp-agent-team

rforsh commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rforsh commented May 20, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant