Add Librarian — AI chat panel for SQL lineage Q&A by liliyaminibaeva · Pull Request #43 · pondpilot/flowscope

liliyaminibaeva · 2026-05-04T11:19:32Z

PR: Add Librarian — AI-powered chat panel for SQL lineage Q&A

Summary

Librarian is a new chat panel that lets users ask natural-language questions about their data using SQL lineage context and uploaded PDF documentation. It sits as a third resizable panel to the right of the analysis view.

Features

AI Chat with structured responses (Summary / Data Lineage / Documentation sections); off-topic questions get a fixed refusal.
Multiple AI providers: OpenAI, Anthropic, and custom OpenAI-compatible endpoints (e.g. LiteLLM). Config in localStorage, sent only to the configured provider.
PDF documentation — drag-and-drop or click upload. Local pipeline: pdfjs-dist text extraction → 500-char chunks → Xenova/multilingual-e5-small embedding (Web Worker, 100+ languages) → cosine similarity search. 10 MB per file, no file count limit.
Inline identifier highlighting in answers — table and column names are styled distinct from inline code; matching is case-insensitive and normalized to canonical schema casing.
Click-to-navigate: clicking an assistant answer reads its Summary section and drives the lineage view via the existing search pipeline. Every referenced column highlights across all owning tables.
Schema search in the Schema view header — substring match by table or column name with Prev/Next cycling.
Per-project state isolation (RAM-only) — chat history, PDFs, and embedded chunks are scoped to the active project; switching projects does not leak content into another project's prompt. F5 wipes everything.
Keyboard shortcut ⌘L / Ctrl+L to toggle the panel.
Test coverage — 304 Vitest tests passing (yarn typecheck clean).

How it works

User question → use-librarian-chat.ts (orchestrator)
  ├── Lineage context ← lineage-formatter.ts ← useLineageState()
  ├── SQL snippet ← useProject() (active file, truncated to 3000 chars)
  ├── PDF context ← vector search ← embedding-service (Web Worker)
  └── Chat history ← Zustand store (last 10 messages sent to AI, full history shown in UI)
        ↓
  context-builder.ts → assembled prompt
        ↓
  ai-service.ts → fetch() → OpenAI / Anthropic / Custom endpoint
        ↓
  Response → store → chat UI (markdown + highlighted identifiers)

All processing runs locally in the browser (embeddings via Web Worker, no backend).
AI answers strictly from provided data (SQL, lineage, PDFs) — no general knowledge.
API keys stored in localStorage, sent only to the configured provider.

Files added

app/src/features/librarian/
├── components/          # UI: panel, chat, PDF upload, settings dialog, toggle
├── services/            # AI service, context builder, lineage formatter, PDF processor, vector search, embeddings
├── hooks/               # Chat orchestrator, sync-active-project
├── workers/             # Embedding Web Worker (Xenova/transformers)
├── utils/               # schema-identifiers (detect, resolve, extractSummary)
├── __tests__/           # Vitest suites
├── types.ts
├── constants.ts
├── store.ts
└── index.ts

app/public/polly-icon.svg                 # Custom duck icon
app/src/lib/lineage-node-resolver.ts      # Resolve ChatReference[] → lineage node ids
app/src/lib/lineage-navigation.ts         # Pure consumer of NavigationTarget for the lineage tab
app/src/lib/__tests__/lineage-node-resolver.test.ts
app/src/lib/__tests__/lineage-navigation.test.ts
docs/librarian.md                         # User guide

Files modified

File	Change
`app/src/components/Workspace.tsx`	Third `ResizablePanel` for Librarian; toggle button in analysis toolbar; `LibrarianPanelWithNavigation` wires chat-click → lineage search-term + reveal
`app/src/components/AnalysisView.tsx`	Wires `actionsRef.current.revealNodeInGraph` into `applyLineageNavigation` deps
`app/src/components/GlobalDropZone.tsx`	Overlay made `pointer-events-none` so PDF drops reach the Librarian dropzone
`app/src/lib/view-state-store.ts`	Added `librarianOpen` flag and per-project Librarian state hook integration
`app/src/lib/shortcuts.ts`	Toggle-librarian shortcut
`app/src/lib/navigation-context.tsx`	`NavigationTarget.highlightNodeIds` / `tablesToExpand` / `primaryFocusId` for chat-click navigation
`app/package.json`	Added `pdfjs-dist`, `@xenova/transformers`, Vitest + RTL deps
`app/vitest.config.ts`	New — Vitest configuration
`CHANGELOG.md`	Single `[Unreleased]` entry describing the Librarian feature
`docs/librarian.md`	User guide

New dependencies

Package	Purpose	Size
`pdfjs-dist`	PDF text extraction in browser	~800 KB
`@xenova/transformers`	Local embedding model inference	~2 MB JS + ~23 MB model (lazy loaded, cached in `env.useBrowserCache`)
`vitest`, `@testing-library/react`, `jsdom`	Testing (devDependencies)	—

Known limitations

Single-substring search. Click-to-navigate writes one substring into the lineage search box; heterogeneous Summary references (e.g. both MANDT and BUKRS) cannot share a single search term — the first column wins. The user can dismiss the highlight via the lineage search box.
SchemaView highlights one table at a time. The Schema view's selection prop accepts a single selectedTableName; schema search uses Prev/Next cycling to compensate.
Chat-click overrides active manual lineage search. Setting the search term on click overwrites whatever the user had typed; clearing the search box resets to no highlight.
Per-project state is RAM-only. Page reload (F5) clears chat, PDFs, and embedded chunks. By design — no localStorage persistence to keep the LLM context predictable.

Test plan

Librarian is a chat panel that lets users ask natural-language questions about their data using SQL lineage context and uploaded PDF documentation. Features: - AI chat with structured responses (Summary / Data Lineage / Documentation) - OpenAI, Anthropic, and custom endpoint support (LiteLLM compatible) - PDF upload with text extraction, chunking, and vector search (embeddings) - Drag-and-drop PDF upload with GlobalDropZone integration - Keyboard shortcut to toggle panel - Full test coverage with Vitest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Set useBrowserCache=true to prevent CORS errors when loading the embedding model from Hugging Face (no-cache requests get blocked) - Strengthen "No information." instruction in prompt to reduce AI deviations from the expected response format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add min-w-0 on the file-name cell, whitespace-nowrap on the size span, and shrink-0 on the delete button so the file name truncates with ellipsis while the size and trash icon stay fully on screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a schema-identifiers utility and renders known table/column names in assistant messages as distinct font-mono/primary tokens. Updates the system prompt to emit bare identifiers, include technical names in Summary, and refuse off-topic questions with a canned response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a collapsible magnifying-glass search control overlaid on the Schema tab. Typing prefix-matches a table (case-insensitive) and drives the existing selectedTableName highlight; clearing or Escape collapses the control and clears selection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

App workspace tests, lint, and typecheck all clean. Pre-existing failures in packages/react and pdf-processor.test.ts are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes out the 2026-04-20 Librarian UI/LLM plan: Unreleased notes cover the new help popover, schema search, clickable assistant messages, styled schema identifiers, and the toolbar toggle move; plan archived to docs/plans/completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix two test issues found during code review of the Librarian UI/LLM polish plan: a misleading test name that referenced max-h-[200px] while asserting on max-h-[64px], and a fake persistence test for librarianOpen that only restated the prior test's assertion instead of verifying the partialize output reaches localStorage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- AnalysisView: keep LibrarianToggleButton reachable when no analysis result is loaded (was hidden because the early-return placeholder dropped the toolbar after the toggle was moved out of the global header). - chat-messages: don't navigate to schema when the user is selecting text inside an assistant bubble or clicking inside a code block — preserves text selection and SQL copy interactions on clickable bubbles. - embedding-worker: refresh stale comment that contradicted the now enabled browser cache. - pdf-processor.test: replace `as any` casts with a typed cast to clear pre-existing lint errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

UI: - Narrow-panel fixes: PDF file row truncates name, delete button stays visible - Moved Librarian toggle to analysis toolbar next to Schema button, rounded-full shape preserved in both states - Replaced BookOpen with custom Polly duck icon in toolbar, chat avatar, empty state - Aligned Librarian panel header height with analysis toolbar (44px) - Chat input placeholder: "Ask about your data..." - Colored inline code (table/column names) in accent color Schema search: - Extended search to column names (falls back to owning table) - Prev/Next navigation for multiple matches with "1/N" indicator - Enter/Shift+Enter to cycle matches, Escape to close Prompt tuning: - Summary must include concrete table/column names, not vague overviews - Documentation must cite source PDF file name - Off-topic refusal and bare-token identifier formatting retained Embeddings: - Switched to multilingual-e5-small (100+ languages) - Added query/passage prefix handling required by e5 models - User questions embedded as "query:", PDF chunks as "passage:" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- New docs/librarian.md with user-facing guide (AI setup, PDFs, shortcuts, privacy, troubleshooting) - New app/src/features/librarian/TEST_CASES.md with 25 manual test cases and the sample SAP SQL pipeline used during testing - README.md: mention Librarian in Web App features, Key Features, and Documentation sections - docs/README.md: add Features section pointing to librarian.md - app/ARCHITECTURE.md: document features/librarian/ module, chat data flow, Polly icon, and ⌘L shortcut - Remove unused Polly_librarian_violet.svg (icons use polly-icon.svg) - Remove stale plan doc (superseded by current code + CHANGELOG) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduce byProject keyed by activeProjectId; rewrite mutators to operate on the active bucket and no-op when no project is active. Add useLibrarianMessages / useLibrarianPdfFiles / useLibrarianPdfChunks selector hooks. Flat-shape mirror retained transitionally so existing consumers keep typechecking until Task 3 migrates them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds useSyncActiveProject hook and invokes it once in Workspace.tsx so the Librarian store stays pointed at the right per-project bucket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch LibrarianPanel and PdfUpload to useLibrarianMessages / useLibrarianPdfFiles so the panel reads from the active project's bucket. Add a no-active-project guard: ChatInput accepts an optional noActiveProject flag that disables input and surfaces an empty-state hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace flat-store reads in use-librarian-chat with bucket-scoped reads keyed by activeProjectId, so PDF chunks and chat history sent to the LLM come strictly from the current project. Bail out with a friendly assistant message when no project is active. Seed byProject in the test beforeEach and add a case verifying project A's PDF chunks aren't searched while project B is active. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olation typecheck and lint clean. Per-project isolation manually verified by code review: store mutators are activeProjectId-scoped, selectors return empty when no active project, chat hook reads from the active bucket, chat input disables with a hint when activeProjectId is null. No regressions in the librarian test suite (pre-existing prompt/model/UI test rot from earlier commits is out of scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a CHANGELOG entry, document per-project chat/PDF scoping in the user guide, and move the plan into docs/plans/completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Fix mid-flight project switch leak in use-librarian-chat: capture activeProjectId once and route writes through addMessageToProject so an in-flight LLM response always lands in the originating project's bucket, not whichever project is active when the network call returns. - Add pruneProjectBuckets mutator and wire it from useSyncActiveProject so deleting a project also drops its Librarian bucket (chat history + embedded chunks) instead of leaking RAM for the tab lifetime. - Drop dead-code addMessage('Open or create a project') call in the chat hook; the chat-input UI's noActiveProject hint already covers the empty-state UX. - Compare PDF file names case-insensitively in hasPdfFile so duplicate detection works on case-insensitive filesystems. - Update ARCHITECTURE.md to describe the per-project store shape and the new sync hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Close two remaining per-project isolation gaps surfaced in the second-pass review. PDF mid-flight project switch leak: handlePdfUpload now captures the active project id at upload time and routes addPdfFile / addPdfChunks / setPdfStatus through new explicit-id mutators (addPdfFileToProject, addPdfChunksToProject, setPdfStatusForProject). Without this, switching projects between addPdfFile and the chunks/status writes that follow processPdf would land the chunks in the wrong bucket and leave the originating PDF stuck on processing. Zombie bucket on deleted project: addMessageToProject and the new PDF *ToProject mutators now drop writes when the bucket is missing AND the project is no longer active. This prevents a late-arriving response or chunk write from resurrecting a bucket that pruneProjectBuckets just removed because the project was deleted. Lazy initialization for fresh projects still works because the active project always passes the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves a list of ChatReference values from chat answers into concrete GlobalNode IDs in the lineage graph, plus the parent table IDs that need to be expanded so matched columns become visible. This is the resolution layer the chat-click multi-highlight wiring (Task 4) will consume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the schema-tab single-table click handler with a reference-set flow: chat-messages now parses every identifier with `resolveAllReferences`, the panel hands the full ref array to Workspace, and LibrarianPanelWithNavigation resolves them to lineage node IDs and triggers a `navigateTo('lineage', { highlightNodeIds, tablesToExpand })` so all referenced tables/columns can be highlighted together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tion When a Librarian chat answer is clicked, AnalysisView now expands any parent tables that are not already expanded and selects the first highlighted node so the column references become visible. The branching logic was extracted into applyLineageNavigation so the navigation contract can be unit-tested without mounting the full view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Aligns stale tests with current production code (multilingual embedding model, rewritten prompt, Polly icon, PDF size whitespace) and fixes a SchemaSearchControl regression where emptying the input no longer cleared the table selection. All gates pass: yarn typecheck (0 errors), yarn test (291/291), yarn lint (0 errors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Update the Librarian user guide to describe the new chat-click → Lineage flow, log the change in CHANGELOG.md, and move the completed plan into docs/plans/completed/. Notes the single-select fallback caused by the @pondpilot/flowscope-react public API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Tighten chat-reference qualifier gap to single dot + horizontal whitespace only. Newlines or repeated dots between identifiers (e.g. "BKPF.\n\nMANDT" or "BKPF..MANDT") no longer cause spurious qualified references that would mis-resolve in lineage. - Remove dead `resolveFirstTableReference` and the `SchemaReference` type they returned — no production caller remains after the multi-highlight migration; plan called for removal if unused. - Add regression tests for the tightened gap behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Recenter on parent table when chat-click first highlight is a column. Columns are not top-level ReactFlow nodes (they render inside table nodes), so passing a column id to useNodeFocus.getNode returns undefined and the fitView never fires — the viewport silently failed to recenter whenever a chat answer's first matched reference was a column. The resolver now exposes a primaryFocusId that points at the parent table for column refs; applyLineageNavigation uses it for setFocusNodeId while still passing the column id to selectNode so the column highlights inside its table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two correctness fixes from third-pass review: - lineage-node-resolver: index table-like nodes by qualified name (catalog.schema.name) so columns from multi-schema graphs with duplicate table names route to their actual parent for expansion. Previously the bare-name index dropped schema, so a column from staging.BKPF.MANDT was mapped to whichever BKPF (sap or staging) registered first — leaving the real owning table collapsed and the column highlight invisible. - chat-messages: scope the page-wide-text-selection guard to mouse clicks only. Pressing Enter/Space on a focused chat bubble now always activates regardless of any stale selection elsewhere on the page (keyboard a11y). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on chat click - detectIdentifiers now matches case-insensitively and normalizes matches back to canonical schema casing, so LLM-emitted lowercase references (e.g. rbkp.ZLSPR) resolve like their uppercase canonical counterparts. - LibrarianPanelWithNavigation writes the first column (preferred) or first table from resolved refs into the persisted lineage searchTerm, reusing the existing GraphView search pipeline. Force-enables column edges when a column is referenced so the matching column rows can highlight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

applyLineageNavigation no longer calls setFocusNodeId for the highlightNodeIds branch. The useNodeFocus hook in flowscope-react auto-zooms aggressively, which is jarring for chat answers that reference multiple tables. The search-term highlight + selectNode still flag the relevant tables; the user keeps their viewport. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Upstream v0.7.0 flattened AnalyzeResult: graph nodes and edges are now top-level fields instead of nested under `globalLineage`. Migrate the librarian-side consumers and their tests: - formatLineage now reads result.nodes / result.edges directly - resolveLineageNodeIds uses result.nodes for the global node index - Test fixtures replace globalLineage.{nodes,edges} with the flat fields and statementRefs with statementIds, matching the new Node / Edge contract Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nt table The lineage view's chat-click handler now reacts only to identifiers mentioned in the answer's Summary section, instead of every identifier across Summary / Data Lineage / Documentation. This avoids navigation landing on tables that appear only as supporting context (e.g. "MANDT is not exposed in INVOICE_HEADER" no longer pulls focus to INVOICE_HEADER). - Add extractSummary() utility that lifts the Summary block from the three-section LLM answer template, falling back to the full text when no marker is found - chat-messages.tsx feeds extractSummary(content) into resolveAllReferences for click navigation; inline identifier styling still uses the full message - LibrarianPanelWithNavigation writes the lineage searchTerm before the resolver short-circuits on empty nodeIds, so table cards still highlight via isNodeHighlighted even when no concrete column nodes are reachable - lineage-node-resolver ranks column matches by parent type — actual source tables ('table') win over views ('view') and CTEs ('cte') so primaryFocusId lands on a base table when one exists, instead of a view that just touches the column transitively - lineage-navigation now calls revealNodeInGraph for the highlight branch (gentle pan + pulse via flowscope-react v0.7.0) instead of the aggressive useNodeFocus zoom we removed earlier; AnalysisView wires actionsRef.current.revealNodeInGraph into the deps Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CHANGELOG: collapse the per-iteration Librarian bullets in [Unreleased] into a single dated entry that describes the feature as shipped, since the iterations were local development noise rather than user-visible increments. docs/librarian.md: rewrite the "Jump to Lineage from a chat answer" section to reflect the actual behavior — Summary-scoped navigation, search-term-driven multi-highlight, auto-enabled column edges, gentle pan + pulse on the parent source table, and case-insensitive identifier matching. Replaces the stale "single-select due to API limitation" note with the substring-search trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Run prettier --write on the Librarian feature directory and the two lib files (lineage-navigation, lineage-node-resolver) it owns. No behavior changes — formatting only. Also removes a completed-plans doc that is no longer relevant: the chat-click multi-highlight plan landed via the search-term pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

twoxfh · 2026-05-08T17:36:04Z

Very cool! I tried it out and its very useful. From what I have seen, prompts are also part of the configuration in these instances. Could the context-builder source its prompt from an editable field with the default being the current prompt?

Context can get huge, user facing raw string size might be very helpful. Just a thought!

liliyaminibaeva and others added 30 commits April 30, 2026 17:26

chore: remove accidental non-code file

1349521

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: move Librarian toggle from header to analysis toolbar

c4688a6

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: add help popover to Librarian panel header

7a3eb63

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: navigate to schema on clicking referenced assistant messages

a510ad1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: verify Task 7 acceptance criteria for Librarian UI/LLM fixes

a5972ee

App workspace tests, lint, and typecheck all clean. Pre-existing failures in packages/react and pdf-processor.test.ts are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: sync activeProjectId from project store into Librarian store

d4ec0dc

Adds useSyncActiveProject hook and invokes it once in Workspace.tsx so the Librarian store stays pointed at the right per-project bucket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: finalize Librarian per-project isolation rollout

bcd0639

Add a CHANGELOG entry, document per-project chat/PDF scoping in the user guide, and move the plan into docs/plans/completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: extend NavigationTarget with highlightNodeIds and tablesToExpand

368a5b0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

liliyaminibaeva and others added 8 commits April 30, 2026 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Librarian — AI chat panel for SQL lineage Q&A#43

Add Librarian — AI chat panel for SQL lineage Q&A#43
liliyaminibaeva wants to merge 38 commits into
pondpilot:masterfrom
liliyaminibaeva:feature/librarian

liliyaminibaeva commented May 4, 2026

Uh oh!

twoxfh commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liliyaminibaeva commented May 4, 2026

PR: Add Librarian — AI-powered chat panel for SQL lineage Q&A

Summary

Features

How it works

Files added

Files modified

New dependencies

Known limitations

Test plan

Uh oh!

twoxfh commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

twoxfh commented May 8, 2026 •

edited

Loading