feat: add JSON CLI fallback for PET server failures#1450
feat: add JSON CLI fallback for PET server failures#1450StellaHuang95 wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a resilient JSON CLI fallback for PET environment discovery when the JSON-RPC server cannot be started/restarted, plus telemetry and unit tests for the new CLI parsing/arg-building helpers.
Changes:
- Implement CLI
pet find/resolve --jsonfallback paths forrefresh()andresolve()when server restart attempts are exhausted. - Add exported pure helpers (
buildFindCliArgs,parseRefreshCliOutput,parseResolveCliOutput) and a new unit test suite covering them. - Introduce telemetry event
PET.JSON_CLI_FALLBACKto track fallback usage and outcomes.
Show a summary per file
| File | Description |
|---|---|
| src/managers/common/nativePythonFinder.ts | Adds CLI fallback execution + parsing utilities and telemetry emission when server mode is exhausted. |
| src/common/telemetry/constants.ts | Registers PET.JSON_CLI_FALLBACK and its GDPR property mapping. |
| src/test/managers/common/nativePythonFinder.jsonCli.unit.test.ts | Adds unit coverage for CLI arg builder and stdout parsers. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 3
| // Resolve incomplete environments in parallel, mirroring doRefreshAttempt's Promise.all pattern. | ||
| const resolvePromises: Promise<void>[] = []; | ||
| for (const env of parsed.environments ?? []) { | ||
| if (env.executable && (!env.version || !env.prefix)) { | ||
| // Environment has an executable but incomplete metadata — resolve individually | ||
| resolvePromises.push( | ||
| this.resolveViaJsonCli(env.executable) | ||
| .then((resolved) => { | ||
| this.outputChannel.info(`[pet CLI] Resolved env: ${resolved.executable}`); | ||
| nativeInfo.push(resolved); | ||
| }) | ||
| .catch(() => { | ||
| // If resolve fails, still include the partial env so nothing is silently dropped | ||
| this.outputChannel.warn( | ||
| `[pet CLI] Could not resolve incomplete env, using partial data: ${env.executable}`, | ||
| ); | ||
| nativeInfo.push(env); | ||
| }), | ||
| ); | ||
| } else { | ||
| this.outputChannel.info(`[pet CLI] Discovered env: ${env.executable ?? env.prefix}`); | ||
| nativeInfo.push(env); | ||
| } | ||
| } | ||
| await Promise.all(resolvePromises); |
There was a problem hiding this comment.
In CLI fallback mode, each resolveViaJsonCli() spawns a new PET process. Doing this with unbounded Promise.all() can create a large number of concurrent processes on machines with many incomplete environments, causing CPU/memory pressure and potentially making discovery slower or unstable. Consider adding a concurrency limit (e.g., reuse an existing worker pool / queue, or a small fixed limit like 4–8) for the per-environment resolve subprocesses.
| /** | ||
| * Builds the CLI arguments array for a `pet find --json` invocation. | ||
| * This is exported for testability. | ||
| * | ||
| * @param config The configuration options (workspace dirs, tool paths, cache dir, env dirs). | ||
| * @param options Optional refresh options: a kind filter string or an array of URIs to search. | ||
| * @param venvFolders Additional virtual environment folder paths to include when searching | ||
| * URI-based paths (needed because searchPaths may override environmentDirectories in PET). | ||
| * @returns The args array to pass to the PET binary (after 'find --json'). | ||
| */ | ||
| export function buildFindCliArgs( | ||
| config: ConfigurationOptions, | ||
| options?: NativePythonEnvironmentKind | Uri[], | ||
| venvFolders: string[] = [], | ||
| ): string[] { | ||
| const args: string[] = ['find', '--json']; |
There was a problem hiding this comment.
The docstring says the function returns args "after 'find --json'", but the implementation includes 'find' and '--json' in the returned array. Please update the @returns wording to match the actual return value (or adjust the function to return only the trailing args).
| } catch { | ||
| sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, { | ||
| operation: 'refresh', | ||
| result: 'error', | ||
| }); | ||
| this.outputChannel.error('[pet] JSON CLI fallback: Failed to parse find output:', stdout.slice(0, 500)); | ||
| throw new Error('Failed to parse PET find --json output'); |
There was a problem hiding this comment.
The catch block drops the original parsing error details (and the thrown error message is quite generic). Capturing the caught exception and rethrowing with additional context (including the original error as cause) would make debugging failures much easier, especially when stdout contains unexpected non-JSON output.
| } catch { | |
| sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, { | |
| operation: 'refresh', | |
| result: 'error', | |
| }); | |
| this.outputChannel.error('[pet] JSON CLI fallback: Failed to parse find output:', stdout.slice(0, 500)); | |
| throw new Error('Failed to parse PET find --json output'); | |
| } catch (ex) { | |
| sendTelemetryEvent(EventNames.PET_JSON_CLI_FALLBACK, stopWatch.elapsedTime, { | |
| operation: 'refresh', | |
| result: 'error', | |
| }); | |
| this.outputChannel.error( | |
| '[pet] JSON CLI fallback: Failed to parse find output:', | |
| stdout.slice(0, 500), | |
| ex, | |
| ); | |
| throw new Error('Failed to parse PET find --json output during refresh', { cause: ex }); |
Summary
When the PET (Python Environment Tools) JSON-RPC server fails to start or crashes beyond all restart attempts, environment discovery currently stops entirely. This PR adds a transparent fallback that uses PET's CLI
--jsonmode to continue environment discovery even when the server mode is fully exhausted.Problem
The extension communicates with
pet server— a long-lived process over stdin/stdout using JSON-RPC 2.0. The server already has robust retry logic (3 restarts with exponential backoff: 1s, 2s, 4s). But when all 3 restart attempts fail, the extension throws an error and environment discovery stops. Users see no Python environments and cannot select an interpreter until they manually reload the window.Solution
PET already has a CLI
--jsonmode that does the same discovery work as a one-shot subprocess:pet find --json [paths] [flags]— discovers all environments, outputs a single JSON object to stdoutpet resolve <exe> --json— resolves a single executable to full environment metadataThese CLI commands output the exact same JSON shape as the JSON-RPC notifications (same Rust structs, same
camelCaseserialization), so the existing TypeScript types (NativeEnvInfo,NativeEnvManagerInfo) work without modification.The fallback is entirely internal to
NativePythonFinderImpl. All callers (condaUtils,venvUtils,poetryUtils, etc.) call the samerefresh()andresolve()interface methods and receive the same return types — they never know whether the data came from the server or the CLI.What Changed
src/managers/common/nativePythonFinder.tsNew constant
CLI_FALLBACK_TIMEOUT_MS = 120_000(2 minutes) — generous budget for a full one-shot process scanresolve()— restructured with outer fallback catchensureProcessRunning()is now inside the outertryblock. When all 3 server restarts are exhausted,ensureProcessRunning()throws, the outercatchchecksisServerExhausted(), and diverts toresolveViaJsonCli()instead of propagating the error to callers.doRefresh()— two new fallback checkpointsAfter the retry loop exhausts all attempts, the catch block checks
isServerExhausted()and callsrefreshViaJsonCli()instead of rethrowing. A second check before the finalthrow lastErroracts as a safety net.New private method:
buildConfigurationOptions()Extracts the config-building logic out of
configure()so both the server mode and CLI fallback build the sameConfigurationOptionsfrom the same source (workspace dirs, conda/poetry/pipenv paths, cache dir, extra env dirs).New private method:
isServerExhausted()Returns
trueonly whenrestartAttempts >= 3AND the process is currently down (startFailed || processExited) AND no restart is currently in progress (!isRestarting). TheisRestartingguard prevents concurrent callers from prematurely bypassing to CLI fallback while a valid restart is still underway.New private method:
runPetCliProcess(args, timeoutMs)Spawns the PET binary directly (no shell, to avoid injection risks from user-supplied paths), collects stdout, routes stderr to debug output, and enforces a hard timeout via SIGTERM → SIGKILL. A
settledboolean guards against double-settling the returned Promise when the timeout andclose/errorhandlers race each other.New private method:
refreshViaJsonCli(options)Fallback for
refresh(). CallsbuildConfigurationOptions(), assembles CLI args viabuildFindCliArgs(), spawns PET, parses the{ managers, environments }JSON output, and — mirroring server mode's inline resolve behavior — callsresolveViaJsonCli()for any environments with an executable but missingversionorprefix. Resolves these incomplete environments in parallel viaPromise.all, matching the throughput of server mode'sPromise.all(unresolved)pattern.New private method:
resolveViaJsonCli(executable)Fallback for
resolve(). Runspet resolve <exe> --json, handles thenulloutput case (PET couldn't identify the environment), handles malformed JSON separately from the not-found case, and logs all spawn failures to the output channel.New exported functions (for testability)
buildFindCliArgs(config, options?, venvFolders?)— mapsConfigurationOptions+ refresh options to astring[]of CLI argumentsparseRefreshCliOutput(stdout)— parsespet find --jsonoutput; rejects non-object JSON (including arrays and primitives)parseResolveCliOutput(stdout, executable)— parsespet resolve --jsonoutput; rejects non-object JSON (including arrays and primitives) separately from thenull(not-found) caseConfigurationOptionstype — changed from module-privatetypetoexport typeto allow test files to construct configs directly.src/common/telemetry/constants.tsNew telemetry event
PET.JSON_CLI_FALLBACKwith properties:operation: 'refresh' | 'resolve'— which code path used the fallbackresult: 'success' | 'error'— outcome<duration>(measurement) — milliseconds takenThis lets us track how often server mode fails in the wild and whether the CLI fallback reliably recovers.
src/test/managers/common/nativePythonFinder.jsonCli.unit.test.ts(new file)47 unit tests covering the three exported pure functions:
buildFindCliArgs — no optionsbuildFindCliArgs — kind filter--kindflag, workspace dirs still included, no--workspacebuildFindCliArgs — Uri[] options--workspaceonly when paths are non-empty; emptyUri[]+ emptyvenvFoldersfalls back to workspace dirs; URI fsPaths as positional; venvFolders appended; workspace dirs excluded when paths providedbuildFindCliArgs — configuration flags--environment-directoriesrepeated per-dir (not comma-joined)buildFindCliArgs — edge casesparseRefreshCliOutput — valid JSON[], multiple envs, field preservationparseRefreshCliOutput — error casesnull, JSON primitiveparseResolveCliOutput — valid JSONparseResolveCliOutput — nullparseResolveCliOutput — malformedSyntaxErroron non-JSON, empty string, partial JSONBehavior Notes
environmentnotifications one at a time as PET discovers them. CLI mode blocks until the full scan finishes, then returns everything at once. Environments won't appear gradually in the fallback path. This is acceptable because the fallback is a last resort.--environment-directoriesuses repeated flags — each directory is passed as a separate--environment-directories <dir>argument instead of comma-joining, so paths containing commas are handled correctly on both POSIX and Windows.--workspaceflag is used whenoptionsis a non-emptyUri[](orUri[]+ non-emptyvenvFolders), mirroring the server mode'ssearch_scope = Workspacebehavior (skips global locators, searches only the provided paths). When both are empty,--workspaceis omitted to avoid PET falling back to scanning the current working directory.--kindwith workspace dirs mirrors server mode's behavior wheresearch_kindkeeps the configured workspace dirs so workspace-scoped environments of that kind (e.g. aVenvinside the project) are still found.Testing
All existing tests pass. New unit tests added: 47 tests covering
buildFindCliArgs,parseRefreshCliOutput, andparseResolveCliOutput.