Skip to content

feat: add JSON CLI fallback for PET server failures#1450

Open
StellaHuang95 wants to merge 2 commits intomicrosoft:mainfrom
StellaHuang95:jsonMode
Open

feat: add JSON CLI fallback for PET server failures#1450
StellaHuang95 wants to merge 2 commits intomicrosoft:mainfrom
StellaHuang95:jsonMode

Conversation

@StellaHuang95
Copy link
Copy Markdown
Contributor

Summary

When the PET (Python Environment Tools) JSON-RPC server fails to start or crashes beyond all restart attempts, environment discovery currently stops entirely. This PR adds a transparent fallback that uses PET's CLI --json mode to continue environment discovery even when the server mode is fully exhausted.

Problem

The extension communicates with pet server — a long-lived process over stdin/stdout using JSON-RPC 2.0. The server already has robust retry logic (3 restarts with exponential backoff: 1s, 2s, 4s). But when all 3 restart attempts fail, the extension throws an error and environment discovery stops. Users see no Python environments and cannot select an interpreter until they manually reload the window.

Solution

PET already has a CLI --json mode that does the same discovery work as a one-shot subprocess:

  • pet find --json [paths] [flags] — discovers all environments, outputs a single JSON object to stdout
  • pet resolve <exe> --json — resolves a single executable to full environment metadata

These CLI commands output the exact same JSON shape as the JSON-RPC notifications (same Rust structs, same camelCase serialization), so the existing TypeScript types (NativeEnvInfo, NativeEnvManagerInfo) work without modification.

The fallback is entirely internal to NativePythonFinderImpl. All callers (condaUtils, venvUtils, poetryUtils, etc.) call the same refresh() and resolve() interface methods and receive the same return types — they never know whether the data came from the server or the CLI.

What Changed

src/managers/common/nativePythonFinder.ts

New constant

  • CLI_FALLBACK_TIMEOUT_MS = 120_000 (2 minutes) — generous budget for a full one-shot process scan

resolve() — restructured with outer fallback catch
ensureProcessRunning() is now inside the outer try block. When all 3 server restarts are exhausted, ensureProcessRunning() throws, the outer catch checks isServerExhausted(), and diverts to resolveViaJsonCli() instead of propagating the error to callers.

doRefresh() — two new fallback checkpoints
After the retry loop exhausts all attempts, the catch block checks isServerExhausted() and calls refreshViaJsonCli() instead of rethrowing. A second check before the final throw lastError acts as a safety net.

New private method: buildConfigurationOptions()
Extracts the config-building logic out of configure() so both the server mode and CLI fallback build the same ConfigurationOptions from the same source (workspace dirs, conda/poetry/pipenv paths, cache dir, extra env dirs).

New private method: isServerExhausted()
Returns true only when restartAttempts >= 3 AND the process is currently down (startFailed || processExited) AND no restart is currently in progress (!isRestarting). The isRestarting guard prevents concurrent callers from prematurely bypassing to CLI fallback while a valid restart is still underway.

New private method: runPetCliProcess(args, timeoutMs)
Spawns the PET binary directly (no shell, to avoid injection risks from user-supplied paths), collects stdout, routes stderr to debug output, and enforces a hard timeout via SIGTERM → SIGKILL. A settled boolean guards against double-settling the returned Promise when the timeout and close/error handlers race each other.

New private method: refreshViaJsonCli(options)
Fallback for refresh(). Calls buildConfigurationOptions(), assembles CLI args via buildFindCliArgs(), spawns PET, parses the { managers, environments } JSON output, and — mirroring server mode's inline resolve behavior — calls resolveViaJsonCli() for any environments with an executable but missing version or prefix. Resolves these incomplete environments in parallel via Promise.all, matching the throughput of server mode's Promise.all(unresolved) pattern.

New private method: resolveViaJsonCli(executable)
Fallback for resolve(). Runs pet resolve <exe> --json, handles the null output case (PET couldn't identify the environment), handles malformed JSON separately from the not-found case, and logs all spawn failures to the output channel.

New exported functions (for testability)

  • buildFindCliArgs(config, options?, venvFolders?) — maps ConfigurationOptions + refresh options to a string[] of CLI arguments
  • parseRefreshCliOutput(stdout) — parses pet find --json output; rejects non-object JSON (including arrays and primitives)
  • parseResolveCliOutput(stdout, executable) — parses pet resolve --json output; rejects non-object JSON (including arrays and primitives) separately from the null (not-found) case

ConfigurationOptions type — changed from module-private type to export type to allow test files to construct configs directly.

src/common/telemetry/constants.ts

New telemetry event PET.JSON_CLI_FALLBACK with properties:

  • operation: 'refresh' | 'resolve' — which code path used the fallback
  • result: 'success' | 'error' — outcome
  • <duration> (measurement) — milliseconds taken

This lets us track how often server mode fails in the wild and whether the CLI fallback reliably recovers.

src/test/managers/common/nativePythonFinder.jsonCli.unit.test.ts (new file)

47 unit tests covering the three exported pure functions:

Suite What it tests
buildFindCliArgs — no options Workspace dirs as positional args, no spurious flags
buildFindCliArgs — kind filter --kind flag, workspace dirs still included, no --workspace
buildFindCliArgs — Uri[] options --workspace only when paths are non-empty; empty Uri[] + empty venvFolders falls back to workspace dirs; URI fsPaths as positional; venvFolders appended; workspace dirs excluded when paths provided
buildFindCliArgs — configuration flags Each config field maps to the correct CLI flag; missing fields are omitted; --environment-directories repeated per-dir (not comma-joined)
buildFindCliArgs — edge cases Paths with spaces, comma-containing env dir paths passed safely, venvFolders not added for kind options
parseRefreshCliOutput — valid JSON Well-formed output, missing keys default to [], multiple envs, field preservation
parseRefreshCliOutput — error cases Malformed JSON, empty string, JSON null, JSON primitive
parseResolveCliOutput — valid JSON Full field preservation
parseResolveCliOutput — null Throws with message that identifies the executable
parseResolveCliOutput — malformed SyntaxError on non-JSON, empty string, partial JSON

Behavior Notes

  • No streaming in CLI mode. Server mode streams environment notifications one at a time as PET discovers them. CLI mode blocks until the full scan finishes, then returns everything at once. Environments won't appear gradually in the fallback path. This is acceptable because the fallback is a last resort.
  • --environment-directories uses repeated flags — each directory is passed as a separate --environment-directories <dir> argument instead of comma-joining, so paths containing commas are handled correctly on both POSIX and Windows.
  • --workspace flag is used when options is a non-empty Uri[] (or Uri[] + non-empty venvFolders), mirroring the server mode's search_scope = Workspace behavior (skips global locators, searches only the provided paths). When both are empty, --workspace is omitted to avoid PET falling back to scanning the current working directory.
  • --kind with workspace dirs mirrors server mode's behavior where search_kind keeps the configured workspace dirs so workspace-scoped environments of that kind (e.g. a Venv inside the project) are still found.

Testing

All existing tests pass. New unit tests added: 47 tests covering buildFindCliArgs, parseRefreshCliOutput, and parseResolveCliOutput.

@StellaHuang95 StellaHuang95 added the feature-request Request for new features or functionality label Apr 13, 2026
@eleanorjboyd eleanorjboyd requested a review from Copilot April 14, 2026 20:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a resilient JSON CLI fallback for PET environment discovery when the JSON-RPC server cannot be started/restarted, plus telemetry and unit tests for the new CLI parsing/arg-building helpers.

Changes:

  • Implement CLI pet find/resolve --json fallback paths for refresh() and resolve() when server restart attempts are exhausted.
  • Add exported pure helpers (buildFindCliArgs, parseRefreshCliOutput, parseResolveCliOutput) and a new unit test suite covering them.
  • Introduce telemetry event PET.JSON_CLI_FALLBACK to track fallback usage and outcomes.
Show a summary per file
File Description
src/managers/common/nativePythonFinder.ts Adds CLI fallback execution + parsing utilities and telemetry emission when server mode is exhausted.
src/common/telemetry/constants.ts Registers PET.JSON_CLI_FALLBACK and its GDPR property mapping.
src/test/managers/common/nativePythonFinder.jsonCli.unit.test.ts Adds unit coverage for CLI arg builder and stdout parsers.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment on lines +904 to +928
// Resolve incomplete environments in parallel, mirroring doRefreshAttempt's Promise.all pattern.
const resolvePromises: Promise<void>[] = [];
for (const env of parsed.environments ?? []) {
if (env.executable && (!env.version || !env.prefix)) {
// Environment has an executable but incomplete metadata — resolve individually
resolvePromises.push(
this.resolveViaJsonCli(env.executable)
.then((resolved) => {
this.outputChannel.info(`[pet CLI] Resolved env: ${resolved.executable}`);
nativeInfo.push(resolved);
})
.catch(() => {
// If resolve fails, still include the partial env so nothing is silently dropped
this.outputChannel.warn(
`[pet CLI] Could not resolve incomplete env, using partial data: ${env.executable}`,
);
nativeInfo.push(env);
}),
);
} else {
this.outputChannel.info(`[pet CLI] Discovered env: ${env.executable ?? env.prefix}`);
nativeInfo.push(env);
}
}
await Promise.all(resolvePromises);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CLI fallback mode, each resolveViaJsonCli() spawns a new PET process. Doing this with unbounded Promise.all() can create a large number of concurrent processes on machines with many incomplete environments, causing CPU/memory pressure and potentially making discovery slower or unstable. Consider adding a concurrency limit (e.g., reuse an existing worker pool / queue, or a small fixed limit like 4–8) for the per-environment resolve subprocesses.

Copilot uses AI. Check for mistakes.
Comment on lines +1085 to +1100
/**
* Builds the CLI arguments array for a `pet find --json` invocation.
* This is exported for testability.
*
* @param config The configuration options (workspace dirs, tool paths, cache dir, env dirs).
* @param options Optional refresh options: a kind filter string or an array of URIs to search.
* @param venvFolders Additional virtual environment folder paths to include when searching
* URI-based paths (needed because searchPaths may override environmentDirectories in PET).
* @returns The args array to pass to the PET binary (after 'find --json').
*/
export function buildFindCliArgs(
config: ConfigurationOptions,
options?: NativePythonEnvironmentKind | Uri[],
venvFolders: string[] = [],
): string[] {
const args: string[] = ['find', '--json'];
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says the function returns args "after 'find --json'", but the implementation includes 'find' and '--json' in the returned array. Please update the @returns wording to match the actual return value (or adjust the function to return only the trailing args).

Copilot uses AI. Check for mistakes.
Comment on lines +888 to +894
} catch {
sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, {
operation: 'refresh',
result: 'error',
});
this.outputChannel.error('[pet] JSON CLI fallback: Failed to parse find output:', stdout.slice(0, 500));
throw new Error('Failed to parse PET find --json output');
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block drops the original parsing error details (and the thrown error message is quite generic). Capturing the caught exception and rethrowing with additional context (including the original error as cause) would make debugging failures much easier, especially when stdout contains unexpected non-JSON output.

Suggested change
} catch {
sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, {
operation: 'refresh',
result: 'error',
});
this.outputChannel.error('[pet] JSON CLI fallback: Failed to parse find output:', stdout.slice(0, 500));
throw new Error('Failed to parse PET find --json output');
} catch (ex) {
sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, {
operation: 'refresh',
result: 'error',
});
this.outputChannel.error(
'[pet] JSON CLI fallback: Failed to parse find output:',
stdout.slice(0, 500),
ex,
);
throw new Error('Failed to parse PET find --json output during refresh', { cause: ex });

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature-request Request for new features or functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants