Conversation
…42) (#92) * feat(cli): BYOE endpoints store + 'specsmith endpoints' group (REQ-142) Phase 1 of the Bring-Your-Own-Endpoint sprint. Adds a generic OpenAI-v1-compatible endpoint registry so users can register self-hosted vLLM, llama.cpp server, LM Studio, and TGI backends and pick between them. - src/specsmith/agent/endpoints.py: Endpoint / EndpointAuth / EndpointStore / EndpointHealth dataclasses, schema_version=1, JSON persistence at ~/.specsmith/endpoints.json (chmod 600), token resolution dispatch (none / bearer-inline / bearer-env / bearer-keyring), /v1/models health probe with TLS verify toggle. - src/specsmith/cli.py: 'specsmith endpoints' group with add / list / remove / default / test / models subcommands. Inline-token redaction in --json output, optional bearer-keyring storage with hidden-input prompt, --purge-keyring on remove, --set-default on add. - tests/test_endpoints_store.py + tests/test_endpoints_cli.py: 38 new tests covering validation, round-trip, redaction, token resolution dispatch, and /v1/models health against an in-process fake server. - tests/fixtures/api_surface.json: registered 'endpoints' as a top-level command for REQ-140 stability. - docs/site/endpoints.md: BYOE walkthrough, auth strategy table, security notes, CLI reference. Validation: ruff lint clean, ruff format clean, mypy strict clean for the new module, pytest 66/66 passing across the new suites + the existing api-surface stability test. Co-Authored-By: Oz <oz-agent@warp.dev> * feat(cli): --endpoint flag + openai-compat provider driver (REQ-142) Phase 2 of the Bring-Your-Own-Endpoint sprint. Wires the registry from PR-1 into the chat surface and the persistent serve loop. - src/specsmith/agent/chat_runner.py: new _run_openai_compat driver streams from a registered Endpoint via raw stdlib HTTP / SSE (no openai SDK dependency). run_chat() takes an optional endpoint_id; when set, the BYOE store is consulted and the resolved endpoint short-circuits the auto-detect provider chain. Failure modes (unreachable, 401, missing default model) fall back gracefully. - src/specsmith/cli.py: 'specsmith chat --endpoint <id>' threads through to run_chat. 'specsmith serve --endpoint <id>' resolves the endpoint at startup, derives provider+model, and exports SPECSMITH_ACTIVE_ENDPOINT for downstream consumers. - tests/test_chat_runner_openai_compat.py: 4 new pytest cases against an in-process fake /v1/chat/completions SSE server. Covers happy-path streaming, missing default-model fallback, 401-on-bad-token fallback, and the run_chat entry point with endpoint_id resolution. Validation: ruff lint + format clean, 82/82 passing across the new + existing endpoint and warp parity suites. Co-Authored-By: Oz <oz-agent@warp.dev> * release: v0.8.0 (BYOE) Bump pyproject.toml to 0.8.0 to ship the Bring-Your-Own-Endpoint feature (REQ-142): the new endpoints store + 'specsmith endpoints' CLI group (PR-1) and the openai-compat provider driver wired through 'specsmith chat / serve --endpoint <id>' (PR-2). Co-Authored-By: Oz <oz-agent@warp.dev> --------- Co-authored-by: Oz <oz-agent@warp.dev>
…vity routing (REQ-145, REQ-146) (#93) * feat(agent): restore AgentRunner + ready event + profiles + routing (REQ-145, REQ-146) PR-A: bridge handshake fix - restore agent/runner.py with AgentRunner class wiring _print_banner, _handle_command, _state, _hard_stop, run_interactive, run_task - restore agent/core.py with ModelTier + AgentState - add EventEmitter.ready / system / turn_done / error helpers PR-G: agent profiles + activity routing - new agent/profiles.py with Profile, ProfileStore, RoutingTable, presets - new agent/fallback.py for resilient fallback-chain execution - new specsmith agents CLI group (list/add/remove/default/test/route/preset) - new --endpoint and --agent flags on specsmith run - AgentRunner consults RoutingTable per turn (slash command -> profile) - /agent and /endpoint in-chat commands switch profile/endpoint live - DEFAULT_PRESETS: default, local-only, frontier-only, cost-conscious PR-C/D/E/F CLI surface - specsmith phase show --json (stable schema for VS Code Workflows tree) - specsmith mcp list --json - specsmith rules list --json (project / workspace / personal scopes) CLI 0.8.0 -> 0.10.0 (skips 0.9 to align with extension bump). Tests: tests/test_agent_runner_ready.py, tests/test_agent_profiles.py. Co-Authored-By: Oz <oz-agent@warp.dev> * fix(lint): address ruff findings on 0.10.0 files - agent/fallback.py: Callable/Iterable from collections.abc (UP035), I001 - agent/profiles.py: drop quoted Profile/ProfileStore type annotations (UP037) - agent/runner.py: drop unused json/os/Iterable imports, line wrap, Callable from collections.abc - agent/core.py: drop quoted ModelTier annotations (UP037) - cli.py:7136: line length - tests/test_agent_profiles.py: unused 'store' assignment Co-Authored-By: Oz <oz-agent@warp.dev> * style: ruff format reformats agent runner/profiles/fallback * style: ruff format events.py * test(api-surface): regenerate snapshot for new agents/mcp/rules commands * feat: retire Cloud Runs feature; CI api-surface guard; fallback-chain pytest * Drop \specsmith cloud spawn\ / \cloud-serve\ and \cloud_serve.py\ plus related docs, tests, fixtures, and gitignore lines (REQ-126/REQ-136 retired). * Sync .specsmith/{requirements,testcases}.json + REQUIREMENTS.md + TESTS.md to drop REQ-126 / TEST-126. * Regenerate tests/fixtures/api_surface.json (cloud + cloud-serve gone). * CI: new \�pi-surface\ job diffs the live CLI surface against the committed fixture (REQ-140 drift guard). * New tests/test_fallback_chain.py: 33 hermetic tests covering parse_target, primary/short-circuit, transient HTTPError + network failures fall through, non-transient (4xx / programmer bugs) bubble up, blank target skipping, on_attempt callback resilience, and the FallbackAttempt/FallbackResult dataclass shapes. * run_commit() default co_author trailer is now empty (was an Oz attribution); strip lingering Oz/Warp references from LEDGER.md, CHANGELOG.md, and historical .specsmith/runs/*.pr-body.md artifacts. --------- Co-authored-by: Oz <oz-agent@warp.dev>
* G1 `agents add` diversity guard - ProfileStore.diversity_warnings warns when the reviewer / architect shares a provider family with the coder, plus a new PROVIDER_FAMILIES table and provider_family() helper. The CLI prints yellow warnings (non-fatal); --json output now includes diversity_warnings. * G2 capability filter - ProfileStore.filter_by_capability + a new specsmith agents list --capability flag. * G3 phase next auto-routes - advancing the AEE phase now pins phase:active to the new phase's preferred profile (and seeds the canonical phase:<key> entry on first advance). * G4 TraceVault seal on /agent - in-chat /agent <id> writes a decision seal chained into .specsmith/trace.jsonl so every per-turn profile pin is auditable. Best-effort: read-only fs etc. never breaks the chat loop. * C1 token threading - each provider driver now returns (text, _UsageDelta) and surfaces real token counts: Ollama prompt_eval_count + eval_count, Anthropic final_message.usage, OpenAI stream_options.include_usage, Gemini usage_metadata. A 4-chars/token heuristic fills in when the SDK omits usage. Counts flow through ChatRunResult.tokens_in/out/cost_usd into AgentState.credit() and the per-profile by_profile bucket. * H1 docs/site/agents.md - preset -> route -> per-session -> BYOE walkthrough. * H3 README elevator pitch - multi-agent + BYOE up top. * H4 docs/site/quickstart.md - reproduction script + GIF placeholder.
| loaded = ProfileStore.load(store.path) | ||
| assert loaded.get("custom").model == "claude-sonnet-4-5" | ||
| assert loaded.default_profile_id == "custom" | ||
| assert loaded.remove("custom") is True |
| self._emit_event(type="turn_done") | ||
| if self._hard_stop: | ||
| self._hard_stop = False | ||
| except (KeyboardInterrupt, EOFError): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Brings
mainup to date withdevelopso the default-setup CodeQLscan re-runs against current code, the v0.10.0 tag is reachable from
main, andmainadvertises 0.10.0 (it currently still says 0.7.0in
pyproject.toml).This mirrors the develop → main sync we just did on
BitConcepts/specsmith-vscode#50so both repos follow the samerelease pattern (PRs land on develop, then a periodic merge PR brings
main up to the latest released version).
Why
mainhas been sitting atrelease: v0.7.0(commit4824957) whilethe following landed on
develop:pyproject.tomlon develop is at0.10.0, thev0.10.0tag pointsat
9f961a0(= develop HEAD), and the GitHub release forv0.10.0was published from develop. Despite that,
mainstill publishes 0.7.0to anyone who clones the default branch, and the GitHub default-setup
CodeQL scanner only looks at
main, so it has no signal on thecurrent code.
How
developandmainhave diverged (3 ahead, 8 behind from develop'sperspective) but the 8 commits on
mainnot ondevelopare allhistorical "Merge branch 'develop'" /
release: vX.Y.Zcommits whosecontent is already present in
developvia squash. Doing a mergecommit here picks up all the new content and keeps the older release
commits in history.
Verification
alerts: 0 across both repos (confirmed prior to this PR).
9f961a0).Co-Authored-By: Oz oz-agent@warp.dev