Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
440 changes: 19 additions & 421 deletions README.md

Large diffs are not rendered by default.

27 changes: 20 additions & 7 deletions docs/_social_preview.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,30 @@
ACCENT = "#FF3621" # Databricks orange
LINE = "#252D3F" # subtle separator

# Arial bundles ship on macOS, support a wide glyph set including arrows,
# and have explicit Regular/Bold/Black files (no .ttc index guessing).
FONT_REG = "/System/Library/Fonts/Supplemental/Arial.ttf"
FONT_BOLD = "/System/Library/Fonts/Supplemental/Arial Bold.ttf"
FONT_BLACK = "/System/Library/Fonts/Supplemental/Arial Black.ttf"
# Prefer macOS Arial for local generation, but fall back to Liberation Sans in
# Linux devcontainers.
FONT_CANDIDATES = {
"regular": [
"/System/Library/Fonts/Supplemental/Arial.ttf",
"/usr/share/fonts/truetype/liberation2/LiberationSans-Regular.ttf",
],
"bold": [
"/System/Library/Fonts/Supplemental/Arial Bold.ttf",
"/usr/share/fonts/truetype/liberation2/LiberationSans-Bold.ttf",
],
"black": [
"/System/Library/Fonts/Supplemental/Arial Black.ttf",
"/usr/share/fonts/truetype/liberation2/LiberationSans-Bold.ttf",
],
}

OUT = Path(__file__).parent / "social-preview.png"


def font(size: int, weight: str = "regular") -> ImageFont.FreeTypeFont:
path = {"regular": FONT_REG, "bold": FONT_BOLD, "black": FONT_BLACK}[weight]
path = next((p for p in FONT_CANDIDATES[weight] if Path(p).exists()), None)
if path is None:
raise FileNotFoundError(f"No usable font found for weight={weight!r}")
return ImageFont.truetype(path, size)


Expand Down Expand Up @@ -70,7 +83,7 @@ def main() -> None:
# One-line architecture summary, near bottom. ASCII arrows guarantee
# glyph coverage across any future font swap.
arch_f = font(22, "bold")
arch_text = "ai_parse_document -> typed KPIs -> Vector Search -> cited agent on Mosaic AI"
arch_text = "ai_parse_document -> typed KPIs -> Vector Search -> eval-gated cited agent"
d.text((margin, H - margin - 80), arch_text, font=arch_f, fill=FG)

# Separator + footer.
Expand Down
355 changes: 355 additions & 0 deletions docs/design.md

Large diffs are not rendered by default.

Binary file modified docs/social-preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion specs/001-doc-intel-10k/plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Output: [research.md](./research.md). Decisions captured:
| Idempotency | `APPLY CHANGES INTO` keyed on `filename` for Silver and Gold | SDP native CDC, deterministic on re-upload, no Python helper | Hand-rolled MERGE (rejected: more code paths); content hash key (deferred — filename is sufficient for v1) |
| Quality rubric | 5 dimensions × 0–6 scale; threshold ≥ 22/30; computed via `ai_query` calls in `04_gold_quality.sql` | Mirrors Reffy's 31-point pattern; SQL-native means no Python helper; explicit dimensions help debug rejections | Single `extraction_confidence` (rejected: no debuggability); 3-dim avg (rejected: too coarse) |
| Vector Search index | Delta-Sync index over `gold_filing_sections` filtered by `embed_eligible`; embed `summary` column | Managed sync, no manual refresh; embeds curated content per principle IV | Direct Vector Index (rejected: no managed sync); embedding raw `parsed.text_full` (rejected: noise) |
| Retrieval strategy | Hybrid (keyword + semantic) top-25 → re-rank → top-5 | Reffy pattern; re-rank improves relevance materially; CPU re-rank stays in budget | Pure semantic (rejected: misses exact filings/years); re-rank against top-100 (rejected: latency budget) |
| Retrieval strategy | Hybrid (keyword + semantic) top-25 → re-rank → top-5 | Reffy pattern; re-rank tightens top-5 ordering; CPU re-rank stays in budget | Pure semantic (rejected: misses exact filings/years); re-rank against top-100 (rejected: latency budget) |
| Agent framework | Mosaic AI Agent Framework via `databricks-agents` SDK + MLflow `pyfunc` | First-class Knowledge Assistant + Supervisor primitives; logged + registered in UC | LangGraph standalone (rejected: more glue, no UC registration story) |
| Serving | CPU instance behind AI Gateway; identity passthrough on | Cost-first per Reffy; Gateway gives audit + rate limit + on-behalf-of | GPU (rejected: not needed at scale of pilot); raw endpoint (rejected: no governance layer) |
| State store | Lakebase Postgres (managed) | Native to platform, low-latency reads/writes, fits Reffy pattern; integrates with Apps | Delta tables (rejected: write throughput on small turn-level updates); external Postgres (rejected: governance gap) |
Expand Down
2 changes: 1 addition & 1 deletion specs/001-doc-intel-10k/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quickstart: Deploy and Test the 10-K Analyst

Goal: from a clean clone, stand up the entire stack on the Databricks `dev` target and verify P1, P2, P3 acceptance scenarios in under 30 minutes.
Goal: from a clean clone, stand up the entire stack on the Databricks `dev` target and verify P1, P2, P3 acceptance scenarios in 15–25 minutes.

## Prerequisites

Expand Down
2 changes: 1 addition & 1 deletion specs/001-doc-intel-10k/research.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ tunable as a bundle parameter.
**Rationale**: Reffy reports keyword-only sub-2s but reasoning needs LLM
generation. Hybrid keyword + semantic retrieval to top-25, then a Mosaic AI
re-ranker (CPU) trim to top-5, keeps single-filing P95 ≤ 8s achievable
on CPU serving while improving relevance materially. Bigger windows blow
on CPU serving while improving top-5 ordering qualitatively. Bigger windows blow
the latency budget; pure semantic misses exact ticker/year matches in
financial filings.

Expand Down