Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,37 @@ remain inspectable both in concise chat output and the full trace store.
- **Planner instructions.** Prefer typed file/git capabilities over shell
mutation commands; shell remains the home of verification runs
(tests, builds, linters) and environment inspection.
- **Live turn status.** The inline live renderer now tracks explicit phases
and last-event age, so long turns distinguish planning, tool execution,
observation, approval/input waits, stale event periods, and terminal states.
- **Provider hardening.** Ollama adapter only sends `think=true` for Qwen 3.x
structured-output calls; other thinking models (e.g. `deepseek-v3.2:cloud`)
are no longer misrouted. Structured-JSON responses no longer fall back to
the `thinking` field, so reasoning narrative can't be parsed as a plan.
- **Ollama role mapping.** OpenAI-style `developer` role is mapped to `system`
before hitting Ollama's chat endpoint.

### Fixed

- **Command usage recovery.** Repeated command invocation errors such as
unsupported flags are no longer framed as capability gaps. FCLI now feeds the
concrete stderr back to the planner for one repair attempt and, if still
unrepaired, shows the failed command and stderr in the final message.
- **GitHub CLI planning.** Plans using `gh api ... -r` are rejected before
approval or execution because `gh api` does not support jq's standalone raw
output flag.
- **Recovered command errors.** An early invalid command no longer pollutes the
final assistant message after a later iteration recovers and finishes
successfully.
- **Deferred file writes.** Planner shape hints no longer advertise the
internal `_file_write_note` placeholder as a real tool-call field. If a model
still returns that older malformed shape, FCLI converts it to
`arguments.content_brief` instead of failing the turn during plan repair.
- **Read-only loop detection.** Repeated successful read/search actions with
identical arguments now stop as no-progress instead of growing the planner
prompt until the provider fails.
- **Static quality gates.** Formatting and strict mypy checks are clean again.

### Migration

- **History database schema v4 → v5.** First open of an existing database
Expand Down
6 changes: 6 additions & 0 deletions docs/TECHNICAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,16 @@ Foundation CLI is a local-first, shell-native coding agent that follows an expli

- **Agent entrypoint.** `foundation` starts the interactive shell; `foundation <request>` runs a one-shot turn; admin subcommands (`run`, `tools`, `history`, `trace`, `config`, `doctor`) keep precedence. `foundation chat` remains a strict alias.
- **Typed file capabilities.** `foundation.file.{read,read_chunk,write,edit,apply_diff}` — atomic writes with sha256 conflict detection, pure-Python unified-diff applier. Planner prefers these over `sed`/`echo`.
- **Deferred file bodies.** Large `foundation.file.write` plans use `content_brief`; the orchestrator materializes the literal file body through a separate text-generation call, and malformed `_file_write_note` planner output is normalized back to `content_brief`.
- **Typed git capabilities.** `foundation.git.{status,diff,show,log,stage,unstage,commit}` — workspace-confined, porcelain v2 parsing. Stage / unstage are auto-allowed; `commit` requires approval and never stages implicitly.
- **Bounded replan loop.** Max 32 planning iterations × 40 actions each × 200 total per user turn. Six stop reasons surface why a turn ended (`zero_action_plan`, `pending_approval`, `fatal_execution_failure`, `max_iterations`, `max_actions`, `no_progress`).
- **Iteration-aware trace.** Step ids are scoped `planning:{req}:{iter}` and `action:{req}:{iter}:{action_id}`; `REPLANNED_FROM` edges link iterations. Older v2 traces remain inspectable via schema v5 migration.
- **Concise notices.** Multi-iteration turns summarize with changed-files, commands-run, verification outcome, and approval-required notices. Verification reports PASSED / FAILED / UNAVAILABLE / NOT_ATTEMPTED distinctly so missing binaries aren't misreported as success.
- **Command error recovery.** Usage-shaped shell failures such as invalid flags are fed back to the planner as repairable command invocation errors, not capability gaps. If the loop still cannot recover, the final response keeps the failed command and stderr visible.
- **Known shell-shape validation.** `gh api ... -r` is rejected during plan validation because `gh api` does not support jq's standalone raw-output flag; the planner must repair before approval/execution.
- **Clean success messages after recovery.** If a later iteration recovers from an invalid command and finishes with a zero-action completion, the final assistant message comes from the successful terminal plan instead of appending stale stderr from the earlier failure.
- **Read-only loop guard.** Repeated no-change actions with the same arguments, including successful file reads/searches, are treated as no-progress so a turn cannot keep re-reading the same data until provider context is exhausted.
- **Live turn status.** The inline renderer tracks the current phase, last event, and stale event periods so long-running turns show whether FCLI is planning, running a tool, observing, waiting for approval/input, stale, or finished.
- **Approval boundaries visible.** `foundation doctor` prints risk class, trust tier, and declared side effects for every capability.

## Requirements
Expand Down
245 changes: 245 additions & 0 deletions plans/fcli-fixes-roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# FCLI Fixes Roadmap

## Purpose

Track the near-term fixes needed to get Foundation CLI back to a clean,
trustworthy baseline before any Beekeeper worker integration work starts.

This roadmap is intentionally scoped to FCLI only.

## Source Inputs

- `/Users/anmolnoor/Developer/future-things/fcli-command-error-recovery-plan.md`
- `/Users/anmolnoor/Developer/future-things/fcli-live/loading-and-pending-state-plan.md`
- Current local verification results from this branch.

## Current Baseline

Verified on 2026-06-01:

- `./scripts/uv run pytest` passes: 426 tests.
- `./scripts/uv run ruff check src tests` passes.
- `./scripts/uv run ruff format --check src tests` fails.
- `./scripts/uv run mypy` fails with 11 strict typing errors.
- `./scripts/uv run foundation doctor` passes.

Current worktree notes:

- `uv.lock` changed after bootstrap added `click` metadata.
- `gh-cli-cheatsheet.md` is untracked.
- `res/` is untracked.

## Out Of Scope

- Beekeeper Queen/worker architecture.
- Worker registry, task queue, sandbox backend, skill promotion, memory
curation, and external integrations.
- Product dashboard work outside the current FCLI terminal experience.

## Stage 1: Worktree And Quality Baseline

### Goal

Make the repo state explicit before fixing behavior.

### Tasks

1. Decide whether the `uv.lock` bootstrap change should be kept.
2. Decide whether `gh-cli-cheatsheet.md` and `res/` belong in the repo,
should be ignored, or should remain local scratch files.
3. Run and record the baseline checks:
- `./scripts/uv run pytest`
- `./scripts/uv run ruff check src tests`
- `./scripts/uv run ruff format --check src tests`
- `./scripts/uv run mypy`
- `./scripts/uv run foundation doctor`

### Done Criteria

- Worktree changes are understood and intentionally kept or left untouched.
- The failing gates are known before implementation starts.

## Stage 2: Command Error Recovery

### Goal

Stop turning ordinary command usage failures into capability-gap messages.

Reference failure:

```text
gh api repos/anmolnoor/anmolnoor/readme --jq .content -r
```

The command fails because `gh api` does not support `-r`, but FCLI can frame
the result as an external-access or capability gap. That diagnosis is wrong.
The final user-facing message should preserve the concrete stderr and classify
the problem as a command invocation error.

### Tasks

1. Add a small command-failure classifier near the orchestrator/gap-handoff
boundary.
2. Classify usage-shaped failures as recoverable command errors:
- `unknown shorthand flag`
- `unknown option`
- `unrecognized option`
- `invalid option`
- `Usage:`
- `Try '--help'`
3. Feed the planner one repair observation when a shell argv is invalid:
- include the failed command,
- include exit code,
- include stderr excerpt,
- instruct the planner not to repeat the same argv.
4. Gate capability-gap handoff more narrowly so normal nonzero shell exits,
tests, linters, and command usage errors do not become gap reports.
5. Keep provider-backed phrasing only for true structural gap kinds.
6. Add deterministic final presentation for unrepaired command usage errors.

### Tests

Add tests before implementation:

1. `build_gap_handoff()` does not produce a missing-capability handoff for
`unknown shorthand flag: 'r' in -r`.
2. Repeating the invalid `gh api ... -r` command does not classify the turn as
a capability gap.
3. The second planner call receives the failed command, stderr, and a
do-not-repeat instruction.
4. True missing capability still gets a capability-gap handoff.
5. Missing binary / command unavailable still gets the appropriate handoff.
6. CLI presentation includes the failed command and stderr excerpt.

### Done Criteria

- The exact `gh api ... --jq .content -r` shape no longer becomes a
capability-gap/provider-access message.
- FCLI either repairs the command on the next iteration or stops with the real
stderr visible.
- True capability gaps still render handoff options.

## Stage 3: Static Quality Gates

### Goal

Restore the repo's expected static checks.

### Tasks

1. Run formatting on the affected files:
- `src/foundation/cli.py`
- `src/foundation/services/gap_handoff.py`
- `src/foundation/services/guardrails.py`
- `src/foundation/services/planner.py`
- `tests/test_cli.py`
- `tests/test_file_service.py`
- `tests/test_orchestrator.py`
2. Fix strict mypy errors without weakening type checking.
3. Avoid behavior changes unless a type issue exposes a real bug.

### Known Mypy Areas

- Untyped `__exit__` parameters in monitor/live context managers.
- `_make_sse_handler()` missing return type.
- `_resolve_read_path()` has a raise-helper control-flow issue.
- `Any` leaks in JSON/result handling.
- One `PlannedAction | None` assignment mismatch in orchestrator code.

### Done Criteria

- `./scripts/uv run ruff format --check src tests` passes.
- `./scripts/uv run ruff check src tests` passes.
- `./scripts/uv run mypy` passes.
- `./scripts/uv run pytest` still passes.

## Stage 4: Live Loading UX Polish

### Goal

Make the terminal live state truthful and less vague during long turns.

This is the first patch scope from the FCLI live plan only. It should not touch
provider internals, worker architecture, or add new dependencies.

### Tasks

1. Add an explicit live phase model in `src/foundation/live_turn.py`.
2. Track:
- current phase,
- phase start time,
- last event time,
- last event name.
3. Render clearer collapsed status text for:
- starting,
- thinking,
- planning,
- running tool,
- observing,
- waiting for approval,
- waiting for user input,
- stale,
- completed,
- failed.
4. Add stale/no-event display as a view-layer concern:
- soft stale after 15 seconds,
- hard stale after 60 seconds.
5. Improve expanded detail with phase, elapsed time, last event, current
action, counters, and recent actions.
6. Keep non-TTY and disabled live UX behavior unchanged.

### Tests

Add or update `tests/test_live_turn.py` for:

- event-to-phase transitions,
- phase timing,
- last event tracking,
- stale rendering,
- collapsed status text,
- expanded detail rows,
- terminal states never showing stale.

### Done Criteria

- The collapsed loader never shows only a vague thinking counter.
- The user can tell whether FCLI is planning, running a tool, waiting for
approval/input, stale, done, or failed.
- Existing concise/verbose result rendering still happens after the live widget
exits.

## Stage 5: Docs And Changelog

### Goal

Document only behavior that actually changed.

### Tasks

1. Update `docs/TECHNICAL.md` if command recovery or live UX behavior changes
user-visible semantics.
2. Update `CHANGELOG.md` with concise entries for:
- command error recovery,
- static quality cleanup,
- live UX polish.
3. Do not add Beekeeper roadmap material to FCLI docs.

### Done Criteria

- Docs match runtime behavior.
- Changelog names the fixes without overpromising future work.

## Final Completion Gate

The FCLI fix batch is complete when all of these pass:

```bash
./scripts/uv run ruff check src tests
./scripts/uv run ruff format --check src tests
./scripts/uv run mypy
./scripts/uv run pytest
./scripts/uv run foundation doctor
```

The final worktree should contain only intentional changes tied to this
roadmap.
Loading
Loading