Add Librarian — AI chat panel for SQL lineage Q&A#43
Open
liliyaminibaeva wants to merge 38 commits into
Open
Conversation
Librarian is a chat panel that lets users ask natural-language questions about their data using SQL lineage context and uploaded PDF documentation. Features: - AI chat with structured responses (Summary / Data Lineage / Documentation) - OpenAI, Anthropic, and custom endpoint support (LiteLLM compatible) - PDF upload with text extraction, chunking, and vector search (embeddings) - Drag-and-drop PDF upload with GlobalDropZone integration - Keyboard shortcut to toggle panel - Full test coverage with Vitest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set useBrowserCache=true to prevent CORS errors when loading the embedding model from Hugging Face (no-cache requests get blocked) - Strengthen "No information." instruction in prompt to reduce AI deviations from the expected response format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add min-w-0 on the file-name cell, whitespace-nowrap on the size span, and shrink-0 on the delete button so the file name truncates with ellipsis while the size and trash icon stay fully on screen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a schema-identifiers utility and renders known table/column names in assistant messages as distinct font-mono/primary tokens. Updates the system prompt to emit bare identifiers, include technical names in Summary, and refuse off-topic questions with a canned response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a collapsible magnifying-glass search control overlaid on the Schema tab. Typing prefix-matches a table (case-insensitive) and drives the existing selectedTableName highlight; clearing or Escape collapses the control and clears selection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
App workspace tests, lint, and typecheck all clean. Pre-existing failures in packages/react and pdf-processor.test.ts are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes out the 2026-04-20 Librarian UI/LLM plan: Unreleased notes cover the new help popover, schema search, clickable assistant messages, styled schema identifiers, and the toolbar toggle move; plan archived to docs/plans/completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two test issues found during code review of the Librarian UI/LLM polish plan: a misleading test name that referenced max-h-[200px] while asserting on max-h-[64px], and a fake persistence test for librarianOpen that only restated the prior test's assertion instead of verifying the partialize output reaches localStorage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- AnalysisView: keep LibrarianToggleButton reachable when no analysis result is loaded (was hidden because the early-return placeholder dropped the toolbar after the toggle was moved out of the global header). - chat-messages: don't navigate to schema when the user is selecting text inside an assistant bubble or clicking inside a code block — preserves text selection and SQL copy interactions on clickable bubbles. - embedding-worker: refresh stale comment that contradicted the now enabled browser cache. - pdf-processor.test: replace `as any` casts with a typed cast to clear pre-existing lint errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UI: - Narrow-panel fixes: PDF file row truncates name, delete button stays visible - Moved Librarian toggle to analysis toolbar next to Schema button, rounded-full shape preserved in both states - Replaced BookOpen with custom Polly duck icon in toolbar, chat avatar, empty state - Aligned Librarian panel header height with analysis toolbar (44px) - Chat input placeholder: "Ask about your data..." - Colored inline code (table/column names) in accent color Schema search: - Extended search to column names (falls back to owning table) - Prev/Next navigation for multiple matches with "1/N" indicator - Enter/Shift+Enter to cycle matches, Escape to close Prompt tuning: - Summary must include concrete table/column names, not vague overviews - Documentation must cite source PDF file name - Off-topic refusal and bare-token identifier formatting retained Embeddings: - Switched to multilingual-e5-small (100+ languages) - Added query/passage prefix handling required by e5 models - User questions embedded as "query:", PDF chunks as "passage:" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New docs/librarian.md with user-facing guide (AI setup, PDFs, shortcuts, privacy, troubleshooting) - New app/src/features/librarian/TEST_CASES.md with 25 manual test cases and the sample SAP SQL pipeline used during testing - README.md: mention Librarian in Web App features, Key Features, and Documentation sections - docs/README.md: add Features section pointing to librarian.md - app/ARCHITECTURE.md: document features/librarian/ module, chat data flow, Polly icon, and ⌘L shortcut - Remove unused Polly_librarian_violet.svg (icons use polly-icon.svg) - Remove stale plan doc (superseded by current code + CHANGELOG) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce byProject keyed by activeProjectId; rewrite mutators to operate on the active bucket and no-op when no project is active. Add useLibrarianMessages / useLibrarianPdfFiles / useLibrarianPdfChunks selector hooks. Flat-shape mirror retained transitionally so existing consumers keep typechecking until Task 3 migrates them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds useSyncActiveProject hook and invokes it once in Workspace.tsx so the Librarian store stays pointed at the right per-project bucket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch LibrarianPanel and PdfUpload to useLibrarianMessages / useLibrarianPdfFiles so the panel reads from the active project's bucket. Add a no-active-project guard: ChatInput accepts an optional noActiveProject flag that disables input and surfaces an empty-state hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace flat-store reads in use-librarian-chat with bucket-scoped reads keyed by activeProjectId, so PDF chunks and chat history sent to the LLM come strictly from the current project. Bail out with a friendly assistant message when no project is active. Seed byProject in the test beforeEach and add a case verifying project A's PDF chunks aren't searched while project B is active. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olation typecheck and lint clean. Per-project isolation manually verified by code review: store mutators are activeProjectId-scoped, selectors return empty when no active project, chat hook reads from the active bucket, chat input disables with a hint when activeProjectId is null. No regressions in the librarian test suite (pre-existing prompt/model/UI test rot from earlier commits is out of scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a CHANGELOG entry, document per-project chat/PDF scoping in the user guide, and move the plan into docs/plans/completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix mid-flight project switch leak in use-librarian-chat: capture
activeProjectId once and route writes through addMessageToProject
so an in-flight LLM response always lands in the originating
project's bucket, not whichever project is active when the
network call returns.
- Add pruneProjectBuckets mutator and wire it from useSyncActiveProject
so deleting a project also drops its Librarian bucket (chat history
+ embedded chunks) instead of leaking RAM for the tab lifetime.
- Drop dead-code addMessage('Open or create a project') call in the
chat hook; the chat-input UI's noActiveProject hint already covers
the empty-state UX.
- Compare PDF file names case-insensitively in hasPdfFile so duplicate
detection works on case-insensitive filesystems.
- Update ARCHITECTURE.md to describe the per-project store shape and
the new sync hook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Close two remaining per-project isolation gaps surfaced in the second-pass review. PDF mid-flight project switch leak: handlePdfUpload now captures the active project id at upload time and routes addPdfFile / addPdfChunks / setPdfStatus through new explicit-id mutators (addPdfFileToProject, addPdfChunksToProject, setPdfStatusForProject). Without this, switching projects between addPdfFile and the chunks/status writes that follow processPdf would land the chunks in the wrong bucket and leave the originating PDF stuck on processing. Zombie bucket on deleted project: addMessageToProject and the new PDF *ToProject mutators now drop writes when the bucket is missing AND the project is no longer active. This prevents a late-arriving response or chunk write from resurrecting a bucket that pruneProjectBuckets just removed because the project was deleted. Lazy initialization for fresh projects still works because the active project always passes the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves a list of ChatReference values from chat answers into concrete GlobalNode IDs in the lineage graph, plus the parent table IDs that need to be expanded so matched columns become visible. This is the resolution layer the chat-click multi-highlight wiring (Task 4) will consume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the schema-tab single-table click handler with a reference-set
flow: chat-messages now parses every identifier with `resolveAllReferences`,
the panel hands the full ref array to Workspace, and
LibrarianPanelWithNavigation resolves them to lineage node IDs and triggers
a `navigateTo('lineage', { highlightNodeIds, tablesToExpand })` so all
referenced tables/columns can be highlighted together.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion When a Librarian chat answer is clicked, AnalysisView now expands any parent tables that are not already expanded and selects the first highlighted node so the column references become visible. The branching logic was extracted into applyLineageNavigation so the navigation contract can be unit-tested without mounting the full view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns stale tests with current production code (multilingual embedding model, rewritten prompt, Polly icon, PDF size whitespace) and fixes a SchemaSearchControl regression where emptying the input no longer cleared the table selection. All gates pass: yarn typecheck (0 errors), yarn test (291/291), yarn lint (0 errors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the Librarian user guide to describe the new chat-click → Lineage flow, log the change in CHANGELOG.md, and move the completed plan into docs/plans/completed/. Notes the single-select fallback caused by the @pondpilot/flowscope-react public API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Tighten chat-reference qualifier gap to single dot + horizontal whitespace only. Newlines or repeated dots between identifiers (e.g. "BKPF.\n\nMANDT" or "BKPF..MANDT") no longer cause spurious qualified references that would mis-resolve in lineage. - Remove dead `resolveFirstTableReference` and the `SchemaReference` type they returned — no production caller remains after the multi-highlight migration; plan called for removal if unused. - Add regression tests for the tightened gap behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Recenter on parent table when chat-click first highlight is a column. Columns are not top-level ReactFlow nodes (they render inside table nodes), so passing a column id to useNodeFocus.getNode returns undefined and the fitView never fires — the viewport silently failed to recenter whenever a chat answer's first matched reference was a column. The resolver now exposes a primaryFocusId that points at the parent table for column refs; applyLineageNavigation uses it for setFocusNodeId while still passing the column id to selectNode so the column highlights inside its table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two correctness fixes from third-pass review: - lineage-node-resolver: index table-like nodes by qualified name (catalog.schema.name) so columns from multi-schema graphs with duplicate table names route to their actual parent for expansion. Previously the bare-name index dropped schema, so a column from staging.BKPF.MANDT was mapped to whichever BKPF (sap or staging) registered first — leaving the real owning table collapsed and the column highlight invisible. - chat-messages: scope the page-wide-text-selection guard to mouse clicks only. Pressing Enter/Space on a focused chat bubble now always activates regardless of any stale selection elsewhere on the page (keyboard a11y). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on chat click - detectIdentifiers now matches case-insensitively and normalizes matches back to canonical schema casing, so LLM-emitted lowercase references (e.g. rbkp.ZLSPR) resolve like their uppercase canonical counterparts. - LibrarianPanelWithNavigation writes the first column (preferred) or first table from resolved refs into the persisted lineage searchTerm, reusing the existing GraphView search pipeline. Force-enables column edges when a column is referenced so the matching column rows can highlight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
applyLineageNavigation no longer calls setFocusNodeId for the highlightNodeIds branch. The useNodeFocus hook in flowscope-react auto-zooms aggressively, which is jarring for chat answers that reference multiple tables. The search-term highlight + selectNode still flag the relevant tables; the user keeps their viewport. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream v0.7.0 flattened AnalyzeResult: graph nodes and edges are now
top-level fields instead of nested under `globalLineage`. Migrate the
librarian-side consumers and their tests:
- formatLineage now reads result.nodes / result.edges directly
- resolveLineageNodeIds uses result.nodes for the global node index
- Test fixtures replace globalLineage.{nodes,edges} with the flat
fields and statementRefs with statementIds, matching the new Node /
Edge contract
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt table
The lineage view's chat-click handler now reacts only to identifiers
mentioned in the answer's Summary section, instead of every identifier
across Summary / Data Lineage / Documentation. This avoids navigation
landing on tables that appear only as supporting context (e.g. "MANDT
is not exposed in INVOICE_HEADER" no longer pulls focus to
INVOICE_HEADER).
- Add extractSummary() utility that lifts the Summary block from the
three-section LLM answer template, falling back to the full text when
no marker is found
- chat-messages.tsx feeds extractSummary(content) into
resolveAllReferences for click navigation; inline identifier styling
still uses the full message
- LibrarianPanelWithNavigation writes the lineage searchTerm before the
resolver short-circuits on empty nodeIds, so table cards still
highlight via isNodeHighlighted even when no concrete column nodes
are reachable
- lineage-node-resolver ranks column matches by parent type — actual
source tables ('table') win over views ('view') and CTEs ('cte') so
primaryFocusId lands on a base table when one exists, instead of a
view that just touches the column transitively
- lineage-navigation now calls revealNodeInGraph for the highlight
branch (gentle pan + pulse via flowscope-react v0.7.0) instead of the
aggressive useNodeFocus zoom we removed earlier; AnalysisView wires
actionsRef.current.revealNodeInGraph into the deps
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHANGELOG: collapse the per-iteration Librarian bullets in [Unreleased] into a single dated entry that describes the feature as shipped, since the iterations were local development noise rather than user-visible increments. docs/librarian.md: rewrite the "Jump to Lineage from a chat answer" section to reflect the actual behavior — Summary-scoped navigation, search-term-driven multi-highlight, auto-enabled column edges, gentle pan + pulse on the parent source table, and case-insensitive identifier matching. Replaces the stale "single-select due to API limitation" note with the substring-search trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run prettier --write on the Librarian feature directory and the two lib files (lineage-navigation, lineage-node-resolver) it owns. No behavior changes — formatting only. Also removes a completed-plans doc that is no longer relevant: the chat-click multi-highlight plan landed via the search-term pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Very cool! I tried it out and its very useful. From what I have seen, prompts are also part of the configuration in these instances. Could the context-builder source its prompt from an editable field with the default being the current prompt? Context can get huge, user facing raw string size might be very helpful. Just a thought! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Add Librarian — AI-powered chat panel for SQL lineage Q&A
Summary
Librarian is a new chat panel that lets users ask natural-language questions about their data using SQL lineage context and uploaded PDF documentation. It sits as a third resizable panel to the right of the analysis view.
Features
localStorage, sent only to the configured provider.pdfjs-disttext extraction → 500-char chunks →Xenova/multilingual-e5-smallembedding (Web Worker, 100+ languages) → cosine similarity search. 10 MB per file, no file count limit.yarn typecheckclean).How it works
localStorage, sent only to the configured provider.Files added
Files modified
app/src/components/Workspace.tsxResizablePanelfor Librarian; toggle button in analysis toolbar;LibrarianPanelWithNavigationwires chat-click → lineage search-term + revealapp/src/components/AnalysisView.tsxactionsRef.current.revealNodeInGraphintoapplyLineageNavigationdepsapp/src/components/GlobalDropZone.tsxpointer-events-noneso PDF drops reach the Librarian dropzoneapp/src/lib/view-state-store.tslibrarianOpenflag and per-project Librarian state hook integrationapp/src/lib/shortcuts.tsapp/src/lib/navigation-context.tsxNavigationTarget.highlightNodeIds/tablesToExpand/primaryFocusIdfor chat-click navigationapp/package.jsonpdfjs-dist,@xenova/transformers, Vitest + RTL depsapp/vitest.config.tsCHANGELOG.md[Unreleased]entry describing the Librarian featuredocs/librarian.mdNew dependencies
pdfjs-dist@xenova/transformersenv.useBrowserCache)vitest,@testing-library/react,jsdomKnown limitations
MANDTandBUKRS) cannot share a single search term — the first column wins. The user can dismiss the highlight via the lineage search box.SchemaViewhighlights one table at a time. The Schema view's selection prop accepts a singleselectedTableName; schema search uses Prev/Next cycling to compensate.localStoragepersistence to keep the LLM context predictable.Test plan
yarn typecheck— 0 errors.yarn test— all tests pass.