feat: add resume session feature (Claude Code-inspired append-only JSONL) by ahmedk20 · Pull Request #467 · usestrix/strix

ahmedk20 · 2026-04-26T18:54:00Z

Summary

This PR adds a resume session feature that lets users continue a past penetration test scan from where it left off — including the full LLM conversation history, iteration state, and
agent context.

Rather than a snapshot checkpoint (which loses progress on crashes), this uses an append-only conversation.jsonl file that writes every LLM message to disk in real-time, inspired by how
Claude Code persists sessions.

strix --list-sessions — list all past scans in a table
strix --continue / strix -c — resume the most recent session
strix --resume <run_name> — resume a specific session by name
strix --resume — open an interactive picker (TUI modal or CLI numbered list)
Resuming a completed scan reopens it and asks what to investigate next

Why not the checkpoint approach (from test-resume branch)?

The previous test-resume attempt saved a checkpoint.json snapshot periodically. The fatal flaw: a crash between saves loses all progress since the last checkpoint. It also blocked
resuming completed scans and only worked in --non-interactive mode.

Architecture

Two new files per run:
strix_runs/{run_name}/conversation.jsonl │ Full LLM conversation, appended in real-time
strix_runs/{run_name}/session_meta.json │ Lightweight sidecar for fast listing (no need to parse the full log)
conversation.jsonl entry types:
{"type": "session_start", "scan_config": {...}, "schema_version": 1, ...}
{"type": "message", "role": "user", "content": "...", "iteration": 1, ...}
{"type": "message", "role": "assistant", "content": [...], "iteration": 1, ...}
{"type": "iteration_end", "iteration": 1, "context": {...}, "completed": false, ...}
{"type": "session_end", "completed": true, ...}

On resume: replay message entries → AgentState.messages, latest iteration_end → iteration/context.

Crash safety: even a hard kill (kill -9) leaves all messages written. The worst case is a missing iteration_end marker — the iteration count is inferred from message count instead.

Changes

New files:

strix/telemetry/conversation_log.py — ConversationLog class: real-time JSONL append + replay() for reconstruction
strix/telemetry/session_meta.py — atomic read/write of session_meta.json sidecar
strix/sessions/listing.py — enumerate strix_runs/, return sorted SessionRow list with legacy fallback for old runs
strix/sessions/resume.py — load_resume_bundle() + apply_resume_to_args() + merge_into_agent_config()
strix/sessions/init.py — public API surface
strix/interface/session_picker_cli.py — rich.Table + Prompt.ask interactive picker for --non-interactive mode
strix/interface/session_picker_tui.py — Textual ModalScreen with live search, DataTable, Resume/Cancel buttons

Modified files:

strix/agents/state.py — added PrivateAttr _conversation_log; hooked add_message to append to JSONL in real-time
strix/agents/base_agent.py — wires ConversationLog at init (root agent only); calls append_iteration_end after each loop iteration
strix/telemetry/tracer.py — writes session_meta.json on set_scan_config and finalizes it in cleanup(); calls write_session_end on shutdown
strix/interface/main.py — adds --resume [RUN_NAME], -c/--continue, --list-sessions; makes --target optional when resume flags present
strix/interface/cli.py — injects restored AgentState into agent_config["state"]; skips startup banner on resume
strix/interface/tui.py — pushes SessionPickerScreen on mount when --resume (no arg); skips splash on direct resume

Test plan

19 unit tests added in tests/test_resume_feature.py — all passing
- ConversationLog: roundtrip, thinking_blocks, corrupt line skipping, crash-safe partial replay, empty file
- session_meta: write/read, merge preserves user fields (title/tags), atomic write, corrupt/missing file
- listing: empty root, returns rows, most_recent with/without conv_log, query filter, get by name

Manual QA checklist:

strix --list-sessions with 0 / 1 / many runs
strix -c resumes most recent; works after completed scan
strix --resume <run_name> resumes directly
strix --resume opens TUI picker; search filters, Enter resumes
Hard-kill mid-scan then resume — all messages recovered
Completed scan reopens and asks for follow-up
Old runs (no session_meta.json) appear via legacy fallback
Corrupt conversation.jsonl shows error, doesn't crash

greptile-apps · 2026-04-26T18:57:38Z

Greptile Summary

This PR introduces an append-only JSONL session-persistence layer (conversation.jsonl + session_meta.json) and the corresponding CLI/TUI commands (--resume, -c/--continue, --list-sessions) for resuming past penetration test scans. The architecture is well-thought-out and crash-safe by design, with good unit test coverage for the new modules.

P1 – _scan_mode_explicit guard is never set (strix/sessions/resume.py): the branch meant to respect an explicit --scan-mode override on resume is always taken, so the bundle's saved mode silently overwrites whatever the user passes at the command line.
Three additional issues carried over from prior review threads remain open: the "reopen" synthetic message not being persisted to JSONL, the broken iteration_count expression in _finalize_session_meta, and the non-atomic read-modify-write race in write_session_meta.

Confidence Score: 3/5

Not ready to merge — one new P1 plus three unresolved P1s from prior review rounds.

A new P1 (_scan_mode_explicit never set, so --scan-mode on resume is silently ignored) joins three previously identified P1s (broken iteration_count, 'reopen' message not persisted, non-atomic session-meta write) that are still open. Multiple P1s push the score below the 4/5 ceiling.

strix/sessions/resume.py (new P1), strix/telemetry/tracer.py (broken iteration_count), strix/telemetry/session_meta.py (race condition)

Important Files Changed

Filename	Overview
strix/sessions/resume.py	Core resume logic; `_scan_mode_explicit` guard is never set so the bundle's scan_mode always silently overwrites the user's CLI choice (new P1 finding).
strix/telemetry/tracer.py	Adds session-meta writing and conversation-log finalisation; `iteration_count` generator is broken (flagged in prior thread), otherwise additions look correct.
strix/telemetry/session_meta.py	Atomic metadata sidecar; the read→merge→write cycle has a known race condition under concurrent callers (flagged in prior review thread).
strix/telemetry/conversation_log.py	New append-only JSONL logger with thread-safe writes and crash-safe replay; implementation is clean and well-tested.
strix/interface/main.py	Adds resume/list-sessions CLI flags and bootstrap logic; `"__PICK__"` magic sentinel can collide with a real run name.
strix/interface/session_picker_tui.py	New Textual modal for session picking; contains dead code in compose() that should be removed.
strix/interface/session_picker_cli.py	Interactive CLI picker with rich table; straightforward and correct.
strix/agents/base_agent.py	Wires ConversationLog at init and appends iteration_end after each loop; correct for non-resume path, but the 'reopen' synthetic message ordering issue is noted in prior thread.
strix/agents/state.py	Adds PrivateAttr _conversation_log and hooks add_message to write to JSONL in real-time; clean implementation.
strix/interface/cli.py	Skips startup banner on resume and merges restored AgentState into agent_config; correct.
strix/interface/tui.py	TUI session picker integration; scan_config is rebuilt after picker resolves and set_run_name exists on Tracer.
strix/sessions/init.py	Public API surface re-export; correct.
strix/sessions/listing.py	Clean session enumeration with legacy fallback; filter, sort, and limit logic is correct.
tests/test_resume_feature.py	19 unit tests covering roundtrip, crash safety, corrupt-line skipping, and listing; good coverage of the new modules.

Comments Outside Diff (1)

strix/sessions/resume.py, line 933-935 (link)

_scan_mode_explicit is never set — user's --scan-mode override silently ignored

_scan_mode_explicit is never written anywhere in the codebase, so getattr(args, "_scan_mode_explicit", False) always returns False, making not False always True. The branch therefore always overwrites args.scan_mode with the bundle's saved value — even when the user explicitly passes --scan-mode quick on a resume command. The user's intent is silently discarded.

The fix is to set args._scan_mode_explicit = True in parse_args() when --scan-mode is present, or to drop the guard entirely if overriding with the bundle's mode is always the desired behaviour.

Prompt To Fix With AI

This is a comment left during a code review.
Path: strix/sessions/resume.py
Line: 933-935

Comment:
**`_scan_mode_explicit` is never set — user's `--scan-mode` override silently ignored**

`_scan_mode_explicit` is never written anywhere in the codebase, so `getattr(args, "_scan_mode_explicit", False)` always returns `False`, making `not False` always `True`. The branch therefore **always** overwrites `args.scan_mode` with the bundle's saved value — even when the user explicitly passes `--scan-mode quick` on a resume command. The user's intent is silently discarded.

The fix is to set `args._scan_mode_explicit = True` in `parse_args()` when `--scan-mode` is present, or to drop the guard entirely if overriding with the bundle's mode is always the desired behaviour.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: strix/sessions/resume.py
Line: 933-935

Comment:
**`_scan_mode_explicit` is never set — user's `--scan-mode` override silently ignored**

`_scan_mode_explicit` is never written anywhere in the codebase, so `getattr(args, "_scan_mode_explicit", False)` always returns `False`, making `not False` always `True`. The branch therefore **always** overwrites `args.scan_mode` with the bundle's saved value — even when the user explicitly passes `--scan-mode quick` on a resume command. The user's intent is silently discarded.

The fix is to set `args._scan_mode_explicit = True` in `parse_args()` when `--scan-mode` is present, or to drop the guard entirely if overriding with the bundle's mode is always the desired behaviour.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/main.py
Line: 247-248

Comment:
**`"__PICK__"` sentinel blocks resuming a session literally named `__PICK__`**

`nargs="?"` with `const="__PICK__"` means `--resume __PICK__` sets `args.resume = "__PICK__"`, which is then treated as the "open picker" signal rather than a run name. Any user who happened to have a session folder named `__PICK__` cannot resume it by name and would get an unexpected interactive picker instead.

Consider using a dedicated flag (e.g. `--resume-pick` / `--resume-interactive`) instead of a magic sentinel to avoid the collision entirely.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/session_picker_tui.py
Line: 507-508

Comment:
**Dead code block in `compose`**

The `with self.app.focused if False else self:` construct is always `with self:` (the condition is a compile-time constant `False`) and the body is `pass`, so this block is entirely inert. The `# noqa: SIM210` comment suppresses the linter warning that would otherwise flag the pointless ternary. The two lines should be removed.

```suggestion
    def compose(self) -> ComposeResult:
        from textual.containers import Container, Horizontal
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "feat: add Claude Code-inspired resume se..." | Re-trigger Greptile}

greptile-apps · 2026-04-26T18:57:42Z

+    if mode == "reopen":
+        state.add_message(
+            "user",
+            "The previous scan session has been reopened. Please summarize the key "
+            "findings so far and ask what to investigate or test next.",
+        )


"Reopen" synthetic message not persisted to JSONL

When mode == "reopen", state.add_message(...) is called on a freshly-constructed AgentState whose _conversation_log is still None — set_conversation_log isn't called until BaseAgent.__init__ later. The synthetic "user" message is added to the in-memory state.messages list but never written to conversation.jsonl.

The LLM then responds to it, and that assistant message IS written (the log is set by then). On a subsequent resume of this reopened session, replay() reconstructs a conversation that starts with an assistant message with no preceding user turn — an invalid message sequence that will likely cause an LLM API error or confuse the model.

Fix: write the synthetic message to the log explicitly after the log is wired up (e.g., in BaseAgent.__init__ after set_conversation_log, check for a pending synthetic message), or pass a flag so ConversationLog.append_message can be called immediately after the log is attached.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/sessions/resume.py Line: 58-63 Comment: **"Reopen" synthetic message not persisted to JSONL** When `mode == "reopen"`, `state.add_message(...)` is called on a freshly-constructed `AgentState` whose `_conversation_log` is still `None` — `set_conversation_log` isn't called until `BaseAgent.__init__` later. The synthetic `"user"` message is added to the in-memory `state.messages` list but never written to `conversation.jsonl`. The LLM then responds to it, and that assistant message IS written (the log is set by then). On a subsequent resume of this reopened session, `replay()` reconstructs a conversation that starts with an assistant message with no preceding user turn — an invalid message sequence that will likely cause an LLM API error or confuse the model. Fix: write the synthetic message to the log explicitly after the log is wired up (e.g., in `BaseAgent.__init__` after `set_conversation_log`, check for a pending synthetic message), or pass a flag so `ConversationLog.append_message` can be called immediately after the log is attached. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-26T18:57:43Z

+                    "status": "completed" if completed else "errored",
+                    "ended_at": datetime.now(UTC).isoformat(),
+                    "iteration_count": max(
+                        (a.get("iteration", 0) for ag in self.agents.values()
+                         for a in ag.get("tool_executions", [])), default=0
+                    ),
+                    "vulnerability_count": len(self.vulnerability_reports),
+                    "agent_count": len(self.agents),
+                },
+            )


iteration_count expression crashes silently — always writes 0

self.agents[agent_id]["tool_executions"] is a list[int] (execution IDs appended in log_tool_execution_start), not a list of dicts. The generator expression calls .get("iteration", 0) on each int, which raises AttributeError. The surrounding try/except Exception: pass swallows the error silently, so iteration_count is never written to session_meta.json and the listing table always shows "?" for every session.

# current (broken) — iterates integers, not dicts "iteration_count": max( (a.get("iteration", 0) for ag in self.agents.values() for a in ag.get("tool_executions", [])), default=0 ),

The tracer doesn't store AgentState.iteration directly, so a straightforward fix is to read the final iteration from the last iteration_end entry written to the conversation log, or store the iteration on the tracer when append_iteration_end is called in BaseAgent.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/telemetry/tracer.py Line: 896-905 Comment: **`iteration_count` expression crashes silently — always writes 0** `self.agents[agent_id]["tool_executions"]` is a `list[int]` (execution IDs appended in `log_tool_execution_start`), not a list of dicts. The generator expression calls `.get("iteration", 0)` on each `int`, which raises `AttributeError`. The surrounding `try/except Exception: pass` swallows the error silently, so `iteration_count` is never written to `session_meta.json` and the listing table always shows `"?"` for every session. ```python # current (broken) — iterates integers, not dicts "iteration_count": max( (a.get("iteration", 0) for ag in self.agents.values() for a in ag.get("tool_executions", [])), default=0 ), ``` The tracer doesn't store `AgentState.iteration` directly, so a straightforward fix is to read the final iteration from the last `iteration_end` entry written to the conversation log, or store the iteration on the tracer when `append_iteration_end` is called in `BaseAgent`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-26T18:57:44Z

+def write_session_meta(run_dir: Path, meta: dict[str, Any]) -> None:
+    """Atomically merge *meta* into session_meta.json, preserving user fields."""
+    path = run_dir / "session_meta.json"
+    existing = _read_raw(path) or {}
+
+    merged = {**existing, **meta}
+    # Always preserve user-editable fields from existing file
+    merged["title"] = existing.get("title", merged.get("title"))
+    merged["tags"] = existing.get("tags", merged.get("tags", []))
+    merged["last_updated"] = datetime.now(UTC).isoformat()
+
+    tmp = path.with_suffix(".tmp")
+    tmp.write_text(json.dumps(merged, ensure_ascii=False, indent=2), encoding="utf-8")
+    os.replace(tmp, path)


Non-atomic read-modify-write in write_session_meta

os.replace makes the final rename atomic, but the read→merge→write cycle is not protected by a lock. If two threads call write_session_meta concurrently (e.g., _write_initial_session_meta from set_scan_config and _finalize_session_meta from cleanup racing on shutdown), both can read the same old state and the later write silently discards the earlier one's updates. Consider adding a per-file lock (similar to the one used in ConversationLog._append) around the entire read-merge-write block.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/telemetry/session_meta.py Line: 10-23 Comment: **Non-atomic read-modify-write in `write_session_meta`** `os.replace` makes the final rename atomic, but the read→merge→write cycle is not protected by a lock. If two threads call `write_session_meta` concurrently (e.g., `_write_initial_session_meta` from `set_scan_config` and `_finalize_session_meta` from `cleanup` racing on shutdown), both can read the same old state and the later write silently discards the earlier one's updates. Consider adding a per-file lock (similar to the one used in `ConversationLog._append`) around the entire read-merge-write block. How can I resolve this? If you propose a fix, please make it concise.

Pulls usestrix#467 (feat: add resume session feature) ahead of upstream merge. Enables unattended scans to survive crashes or forced termination (e.g. AWS STS session expiry mid-scan) by appending every LLM message to strix_runs/<run>/conversation.jsonl and replaying on --resume. New CLI: strix --list-sessions # tabulate past scans strix --continue (or -c) # resume most recent strix --resume <run_name> # resume by name strix --resume # open interactive picker Motivates: https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499 — trade-api flag-aware scan exited at 71min with APIConnectionError ("security token included in the request is expired"). Session had produced 10 findings; all lost once the report upload also hit ExpiredToken. With this commit merged, the CI composite action can wrap strix in a re-assume + --continue loop so each sub-session fits inside the 60min STS budget and total scan duration is bounded only by the overall workflow timeout. Upstream integration: - Once usestrix#467 merges, drop from the seedcx-build recipe - Until then, include alongside usestrix#454 (memory compression), usestrix#460 (truncate retry), usestrix#468 (thinking retry), this one, and the adaptive-thinking commit on this branch Clean 3-way merge, no conflicts. Smoke tests: - uv sync succeeds - strix --help surfaces --resume / --continue / --list-sessions - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK Follow-up (separate): strix-scan-workflow README recipe + composite action consumer changes to invoke resume on re-dispatch.

feat: add Claude Code-inspired resume session feature

f31f454

greptile-apps Bot reviewed Apr 26, 2026

View reviewed changes

ahmedk20 closed this Apr 28, 2026

ahmedk20 reopened this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add resume session feature (Claude Code-inspired append-only JSONL)#467

feat: add resume session feature (Claude Code-inspired append-only JSONL)#467
ahmedk20 wants to merge 1 commit into
usestrix:mainfrom
ahmedk20:feat/resume-session

ahmedk20 commented Apr 26, 2026

Uh oh!

greptile-apps Bot commented Apr 26, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 26, 2026

Uh oh!

greptile-apps Bot Apr 26, 2026

Uh oh!

greptile-apps Bot Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ahmedk20 commented Apr 26, 2026

Uh oh!

greptile-apps Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 26, 2026 •

edited

Loading