Skip to content

feat: add headless LLM service and migrate ghost cell suggestions#13730

Open
nstrayer wants to merge 18 commits into
mainfrom
feature/headless-lm-service
Open

feat: add headless LLM service and migrate ghost cell suggestions#13730
nstrayer wants to merge 18 commits into
mainfrom
feature/headless-lm-service

Conversation

@nstrayer
Copy link
Copy Markdown
Contributor

Closes #13728
Part of #13729

Summary

Adds IPositronLMService, a workbench service that streams LLM responses directly from the shared process via ai-provider-bridge, bypassing the extension host and vscode.lm. Migrates ghost cell suggestions to use the new service (behind the existing useHeadlessService feature flag), with a dedicated model picker, context builder, and streaming generator.

Motivation: The previous path routed ghost cell requests through the assistant extension, leaving model selection opaque to the workbench and duplicating credential management. The headless service gives the workbench explicit control over model resolution and credentials, with lower latency as a bonus.

Related:

Note: The ai-provider-bridge dependency is currently resolved via "file:..." link. Once the package is published, the resolution needs to be updated to point at a GitHub release (or npm package).

Release Notes

New Features

  • Added IPositronLMService for direct LLM streaming from the workbench, with tier-based/pattern-based/exact model selection and eager model caching
  • Added "Select Ghost Cell Model" command with a grouped model picker (per-provider)
  • Ghost cell suggestions can now use the headless service path for lower latency (opt-in via positron.assistant.notebook.ghostCellSuggestions.useHeadlessService)

Bug Fixes

  • N/A

Validation Steps

@:positron-notebooks @:assistant

  1. Enable positron.assistant.notebook.ghostCellSuggestions.useHeadlessService in settings
  2. Open a Python or R notebook with at least one executed cell
  3. Verify ghost cell suggestions stream in after a brief delay
  4. Open the command palette and run "Select Ghost Cell Model"
  5. Verify the picker shows models grouped by provider; pin a model
  6. Trigger another ghost cell suggestion and verify it uses the pinned model
  7. Select "Use Default" in the picker and verify it reverts to tier-based auto-selection
  8. Navigate away mid-generation and verify streaming cancels cleanly (no errors in console)
  9. Sign out of all LLM auth providers and verify ghost cells degrade gracefully (no crash, suggestion simply doesn't appear)

Unit tests:

npx vitest run src/vs/workbench/services/positronLM/test/
npx vitest run src/vs/workbench/contrib/positronNotebook/test/common/notebookLMContext.vitest.ts

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

E2E Tests 🚀
This PR will run tests tagged with: @:critical @:positron-notebooks @:assistant

readme  valid tags

nstrayer added 18 commits May 27, 2026 14:17
Introduces IPositronLMService -- a workbench service that streams LLM
responses via @assistant/provider-bridge in the shared process, bypassing
vscode.lm entirely. Ghost cell suggestions can optionally use this path
via a feature flag (positron.assistant.notebook.ghostCellSuggestions.useHeadlessService).

Architecture:
- Shared process: stateless ProviderRegistry (lazy-loaded), 16ms batched
  streaming with in-band sentinels, cancellation via EventDispose
- Renderer: per-request credential resolution (5s cache), racing model
  fetch across providers, AsyncIterableSource bridging
- Generator: single deep module the controller calls with one function

New files:
- src/vs/workbench/services/positronLM/ (common, browser, node, test)
- src/vs/workbench/contrib/positronNotebook/.../generation/generator.ts
- Move browser/ to electron-browser/ (ISharedProcessService requires
  electron-browser layer)
- Replace `in` operator with typeof check for stream event sentinel
- Fix double-quoted strings in lexer (use escaped single quotes)
- Add eslint-disable for IServerChannel any signatures and dynamic import
- Add eslint-disable for sharedProcessMain workbench import (to be moved
  to vs/platform/ in a follow-up)
- Use discriminated IPC stream events ({type:'data'|'end'|'error'})
  instead of plain string sentinels to prevent text chunks containing
  "end" from prematurely closing the stream
- Fix wrong config key for maxVariables (was positronNotebook.ghostCell.
  maxVariables, should be the registered constant)
- Include error/stderr output in ghost cell context so the model can
  suggest fixes based on actual exception details
- Add 60s outer timeout around headless generation to prevent indefinite
  loading state if model resolution or streaming stalls
The timeout promise now cancels the CancellationTokenSource so the
underlying LM stream stops, preventing stale progress callbacks from
updating the UI after a timeout error. The timeout handle is also
cleared on normal completion to avoid leaking the timer.
Track a `timedOut` flag so the catch block can differentiate between
user-initiated cancellation (silently hide) and timeout-initiated
cancellation (show error state). Previously the timeout called
cts.cancel() then rejected, but the catch block suppressed all errors
when token.isCancellationRequested was true, causing the ghost cell to
stay stuck in loading/streaming state on timeout.
Replace all console.log/error/warn calls with _logService at appropriate
levels (debug/trace/error). Remove the ping/pong IPC health-check that
was used during development.
…ovider-bridge

The package was renamed as part of extraction to its own GitHub repo
(posit-dev/llm-provider-bridge). Update the file: link and dynamic
imports to use the new unscoped package name.
…ority

Introduces IPositronLMService -- a workbench service that streams LLM
responses via ai-provider-bridge in the shared process, bypassing
vscode.lm entirely. Ghost cell suggestions use this path when the
useHeadlessService feature flag is enabled.

Architecture:
- Shared process: PositronLMNode with lazy-loaded ProviderRegistry,
  16ms batched streaming, cancellation via last-listener dispose
- Renderer: eager model cache warmed at activation, session-change
  refresh, provider-priority matching, per-request credential re-resolve
- Generator: single deep module parsing structured XML via StreamingTagLexer
- Context: shared buildNotebookLMContext helper with variable priority

Also extracts notebook LM context building into a shared helper with
full test coverage, and adds vitest tests for provider-priority matching
and the streaming tag lexer.
The headless LM service was tightly coupled to ghost cell configuration
and only worked on desktop. This refactor:

- Introduces ModelSelection discriminated union (tier/id/patterns) so
  callers express intent without the service knowing who's calling
- Returns StreamResult with failure reason instead of null, enabling
  informative user-facing messages
- Adds availableModels + onDidChangeAvailableModels for picker UIs
- Extracts logic into AbstractPositronLMService in common/, with thin
  platform subclasses for desktop (electron-browser/) and server-hosted
  (browser/) deployments
- Registers tier settings (positronLM.models.fastcheap) owned by the
  service rather than ghost-cell-specific config
- Ghost cells now forward their own config as { patterns } to the service
Standalone function that shows a QuickPick grouped by provider,
returns the user's selection without persisting it. Consumers own
their own storage. "Use Default" option shown only when a model
is currently pinned.
Register selectGhostCellModel command in the workbench that uses
IPositronLMService.availableModels with the reusable showModelPicker
utility. This shows models from the same catalog the service uses for
generation, fixing the mismatch where the extension picker showed
vscode.lm models (only Bedrock) while generation used the headless
service (all providers).
- electron-browser: always use shared process channel (remote server
  does not register positronLM channel yet)
- Model picker: localize all user-facing strings
- Ghost cell action: localize picker title
- config.ts: update command link from old extension command
- README: add missing hasKey import to usage example
- Store the onCancellationRequested disposable so it's cleaned up when
  the stream ends normally (prevents leak on long-lived tokens)
- Inline toProviderCredentials() -- it was a no-op identity function.
  Comment left at the site for future extraction if field stripping
  becomes necessary.
- Add rationale comment above PROVIDER_PRIORITY ordering
The command was a NotebookAction2, which silently no-ops when no notebook
is active. The settings markdown description links to this command, so
clicking "Select a model" from Settings would do nothing. Converting to
a plain Action2 removes the notebook precondition -- the picker only needs
IPositronLMService and IConfigurationService.
Provider display names now go through nls.localize() for i18n. Fixed
deepseek auth provider ID ('deepseek' -> 'deepseek-api') caught by new
sync test that asserts AUTH_PROVIDER_MAP stays consistent with the
bridge's PROVIDER_MAP.
Avoids layering lint warnings (code-import-patterns, code-amd-node-module)
by importing at runtime inside the test body.
Replace the local file: link with the GitHub release tarball so CI and
other developers can resolve the dependency. Install all required peer
deps (@ai-sdk/*, @aws-sdk/*, @github/copilot-sdk, etc.) that the
providers entrypoint needs at runtime.
@nstrayer nstrayer force-pushed the feature/headless-lm-service branch from abb8d39 to 4aebce4 Compare May 27, 2026 18:39
Comment on lines +1 to +13
/*---------------------------------------------------------------------------------------------
* Copyright (C) 2026 Posit Software, PBC. All rights reserved.
* Licensed under the Elastic License 2.0. See LICENSE.txt for license information.
*--------------------------------------------------------------------------------------------*/

/**
* A chunk representing a tag boundary (open or close).
*/
export interface TagChunk {
type: 'tag';
name: string;
kind: 'open' | 'close';
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fully copied from the positron-assistant extension. Same code exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a headless LLM streaming service to the workbench

1 participant