Skip to content

day2/04: Helios RFP multi-agent build (Bounteous demo)#7

Open
rforsh wants to merge 4 commits into
victorsteeb:mainfrom
rforsh:rfp-agent-team
Open

day2/04: Helios RFP multi-agent build (Bounteous demo)#7
rforsh wants to merge 4 commits into
victorsteeb:mainfrom
rforsh:rfp-agent-team

Conversation

@rforsh
Copy link
Copy Markdown

@rforsh rforsh commented May 20, 2026

Summary

Day 2 / Exercise 04 Agent Hackathon submission — a multi-agent RFP responder
for Helios Security, packaged as a self-contained Bounteous-branded HTML
chat/dashboard plus the rewritten Jupyter notebook.

  • 5-agent team (Parser, Retriever, Drafter, Validator, Reviser) with a
    Haiku/Sonnet model split and a bounded revision loop + post-revision
    recheck. Failure budget covers all 6 stated failure modes.
  • Self-contained HTML with 3-pane Bounteous-branded dashboard, live
    agent trace (click-to-expand), draggable splitters with persisted
    layout, always-on chat grounded on the live run snapshot, streaming
    responses with markdown rendering, and a live cost calculator
    (per-call USD + token counts).
  • Architecture tab with 3 Mermaid diagrams (pipeline, per-Q
    lifecycle, failure budget) and a How-to-use tab explaining every
    control.
  • Notebook mirrors the same 5-agent pipeline in Python; ships with
    5 synthetic RFP fixtures (table / list / prose / edge-case / adversarial)
    • Part 11 eval harness.
  • Surprise RFP (Sample F · Five Questions) included; live-tested in
    41s — Q3 hallucination trap defeated (kb_gap, src=0, no fabrication),
    Q4/Q5 properly hedged with kb_gap and drafter-side undocumented flags.
  • serve.py optional launcher lifts ANTHROPIC_API_KEY from env so
    the UI auto-fills the key.

Test plan

  • Notebook is valid JSON; all 7 required functions present
    (run_parser, run_retriever, run_drafter, run_validator,
    run_reviser, safe_parse_json, process_rfp).
  • HTML opens standalone; all 6 sample RFPs load; no-key fail-fast
    works; no console errors.
  • Live end-to-end run against surprise RFP (Sample F): 41s, 2 OK /
    3 flagged / 0 failed / 2 revisions, cost ≈ $0.04.
  • All 6 failure budget modes implemented and 4 of 6 exercised by
    the live surprise-RFP run.
  • Streaming chat verified rendering markdown live with cost tile
    updating each turn.
  • Click-to-expand on trace entries shows full event + linked card
    detail.
  • All borders draggable, layout persists to localStorage.

🤖 Generated with Claude Code

rforsh and others added 4 commits May 20, 2026 15:15
5-agent team (Parser/Retriever/Drafter/Validator/Reviser) with bounded
revision + post-revision recheck. Self-contained Bounteous-branded HTML
chat/dashboard UI that calls the Anthropic API directly from the browser,
plus a Jupyter notebook that mirrors the same pipeline in Python. Failure
budget: 6 modes (malformed RFP / KB miss / API error / bad JSON /
contradiction / missing key), each fails properly without blowing up.

- helios_agent.html (56k) — 3-pane dashboard + bottom-spanning chat
  + Architecture tab w/ Mermaid diagrams; Haiku for Parser/Retriever,
  Sonnet for Drafter/Validator/Reviser; KB embedded as JS verbatim.
- Agent_Engineering_Challenge.ipynb (34 cells) — Parts 0-4 unchanged;
  new Parts 5-11: agents, validator+reviser, 5 synthetic RFPs, safe-JSON
  helper, E2E run, HTML pointer, evals.
- README.md — architecture diagram + failure-mode matrix.
- serve.py — optional launcher; exposes /api/env-key so the UI lifts
  ANTHROPIC_API_KEY from env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add the Five Questions surprise RFP as sample F. Live-tested: 41s, 2
  OK / 3 flagged / 0 failed. Q3 (Kubernetes — KB has zero coverage)
  correctly refused to fabricate (src=0, kb-gap). Q4 self-flagged
  support-access-exceptions-undocumented and telemetry-exceptions-
  undocumented. Q5 hedged on air-gapped deployment boundary.
- Bottom chat is now always-on (was gated on run completion) and
  receives a RUN_SNAPSHOT JSON with current RFP, cards, trace tail,
  KB summary, validation, and final answers each turn. Multi-turn
  history kept (last 12 turns).
- Draggable splitters between every pane (left/right/bottom), with
  layout persisted to localStorage.
- Third tab "How to use" added between Architecture and Run.
- Post-revision recheck + unresolved-contradiction flag wired in
  both HTML and notebook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each trace row is now a clickable disclosure. Expanding shows: full
timestamp, role, target, level, latency, full outcome text, and when
the event targets a known question, the linked card's current state,
category, confidence, sources, flags, and the full draft text.

Click again to collapse. Open rows get a violet left-border accent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drawer

- Chat replies now stream token-by-token via Anthropic SSE
  (callAnthropicStream). Live markdown rendering updates as the
  stream arrives.
- Tiny safe markdown renderer for chat output: headings, bold,
  italic, code, lists, paragraphs, hr. HTML-escapes source first.
- Cost calculator: pricing table for sonnet/haiku models, per-call
  usage capture for both streaming (message_start + message_delta
  events) and non-streaming responses. New "Cost" scoreboard tile
  shows USD total + token counts, updates live.
- Chat log now flex-grows with the bottom drawer height instead of
  capping at 240px.
- Splitter handles widened from 6px to 10px with visible center
  grab indicators (vertical pill for col-resize, horizontal pill for
  row-resize).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant