Skip to content

fix(llm): include system prompt tokens in memory compressor budget#381

Merged
0xallam merged 2 commits into
usestrix:mainfrom
0xhis:fix/memory-compressor-token-budget
May 3, 2026
Merged

fix(llm): include system prompt tokens in memory compressor budget#381
0xallam merged 2 commits into
usestrix:mainfrom
0xhis:fix/memory-compressor-token-budget

Conversation

@0xhis
Copy link
Copy Markdown
Contributor

@0xhis 0xhis commented Mar 21, 2026

Summary

The memory compressor was not accounting for system prompt and agent identity tokens when calculating the conversation budget. This caused premature history truncation on long scans with large system prompts.

Changes

  • Add reserved_tokens parameter to compress_history() that subtracts already-accounted tokens from the budget before applying limits
  • Calculate reserved_tokens from system prompt and agent identity messages in _prepare_messages()

Files Changed

  • strix/llm/llm.py (+7/-2)
  • strix/llm/memory_compressor.py (+8/-1)

Split from #328.

@0xhis 0xhis marked this pull request as ready for review March 21, 2026 08:02
Copilot AI review requested due to automatic review settings March 21, 2026 08:02
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 21, 2026

Greptile Summary

This PR fixes a budget-accounting gap in the memory compressor: the system prompt and agent-identity messages were not included in the token total, so compression could be triggered too late (or not at all) on long scans with large system prompts.

Changes:

  • compress_history() gains a reserved_tokens: int = 0 parameter that is added to the conversation-history token count before the 90 % threshold check. The default of 0 is fully backward-compatible.
  • _prepare_messages() computes reserved_tokens from the frame messages it builds (system prompt + optional agent-identity block) before calling compress_history(), correctly separating the two accounting domains.
  • _get_message_tokens is re-exported from memory_compressor.py and imported in llm.py to avoid duplicating the token-counting logic. The symbol carries a leading _, which conventionally marks it as a private/internal helper — consider renaming it to get_message_tokens or exposing it through the MemoryCompressor class to make the public-export intent clear.

Confidence Score: 5/5

  • Safe to merge; the fix is correct, backward-compatible, and well-scoped.
  • The root cause (missing frame-token accounting) is addressed directly and the approach is sound. The reserved_tokens default of 0 preserves existing behavior for all other callers. The only finding is a non-blocking style suggestion about exporting a private symbol across module boundaries, which does not affect correctness or safety.
  • No files require special attention.

Important Files Changed

Filename Overview
strix/llm/llm.py Calculates reserved_tokens from the system-prompt and agent-identity messages before compression, and forwards them to compress_history(). Logic is correct; minor style concern around importing the private _get_message_tokens symbol.
strix/llm/memory_compressor.py Adds reserved_tokens: int = 0 parameter to compress_history() and folds it into the total_tokens sum before the 90 % threshold check. Backward-compatible default and clear docstring. No issues found.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 13

Comment:
**Importing a private helper across module boundaries**

`_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:
1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or
2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic.

```suggestion
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix(llm): include sy..."

Comment thread strix/llm/llm.py Outdated
from strix.config import Config
from strix.llm.config import LLMConfig
from strix.llm.memory_compressor import MemoryCompressor
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Importing a private helper across module boundaries

_get_message_tokens is prefixed with _, which conventionally signals it is an internal implementation detail of memory_compressor.py. Importing it directly into llm.py couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:

  1. Renaming it to get_message_tokens (dropping the leading _) to make the public-export intent explicit, or
  2. Exposing a small helper on MemoryCompressor itself (e.g. MemoryCompressor.count_tokens(messages)) so the compressor owns all token-counting logic.
Suggested change
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 13

Comment:
**Importing a private helper across module boundaries**

`_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:
1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or
2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic.

```suggestion
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes token budgeting in the LLM memory compressor so that system prompt and agent identity “framing” tokens are accounted for when deciding whether (and how much) to compress conversation history, preventing unexpected truncation behavior on long runs with large system prompts.

Changes:

  • Added a reserved_tokens parameter to MemoryCompressor.compress_history() to incorporate non-history prompt tokens into the budget calculation.
  • Calculated reserved_tokens in LLM._prepare_messages() from the system prompt + agent identity messages and passed it into the compressor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
strix/llm/memory_compressor.py Extends history compression budgeting with a reserved_tokens parameter and uses it in token calculations.
strix/llm/llm.py Computes reserved/system framing tokens during message preparation and passes them to the memory compressor.
Comments suppressed due to low confidence (1)

strix/llm/memory_compressor.py:216

  • compress_history() can still return a history that exceeds the token budget when reserved_tokens (or the non-compressible system/recent messages) already consume most/all of the limit. Consider explicitly handling reserved_tokens >= MAX_TOTAL_TOKENS * 0.9 (and/or re-checking after summarization) by dropping more of the history (e.g., reduce recent messages, return only system messages, or return an empty history) so the final prompt stays within limits.
        total_tokens = reserved_tokens + sum(
            _get_message_tokens(msg, model_name) for msg in system_msgs + regular_msgs
        )

        if total_tokens <= MAX_TOTAL_TOKENS * 0.9:
            return messages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +173 to +178
Args:
messages: Conversation history messages to compress.
reserved_tokens: Tokens already reserved for system prompt and
other framing messages outside the conversation history.
Subtracted from the budget before checking limits.

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says reserved_tokens is “Subtracted from the budget before checking limits”, but the implementation adds it into total_tokens and compares against the fixed budget. Either update the wording to match the behavior (reserved tokens are included in the total prompt token count) or change the logic to compute an available_budget = MAX_TOTAL_TOKENS * 0.9 - reserved_tokens and compare history tokens against that.

Copilot uses AI. Check for mistakes.
Comment thread strix/llm/llm.py Outdated
from strix.config import Config
from strix.llm.config import LLMConfig
from strix.llm.memory_compressor import MemoryCompressor
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm.py imports _get_message_tokens from memory_compressor.py, but the leading underscore indicates this is intended as a private helper. To reduce coupling, consider exposing a public token-counting helper (e.g., get_message_tokens) or adding a MemoryCompressor.get_message_tokens() method, and have LLM call that instead of importing a private symbol.

Suggested change
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
from strix.llm.memory_compressor import MemoryCompressor

Copilot uses AI. Check for mistakes.
@0xhis 0xhis force-pushed the fix/memory-compressor-token-budget branch from 2502a59 to 22d978e Compare March 21, 2026 08:28
0xhis added 2 commits March 21, 2026 01:49
The memory compressor was not accounting for system prompt and agent
identity tokens when calculating the conversation budget. This caused
premature history truncation on long scans with large system prompts.

Adds a reserved_tokens parameter to compress_history() that subtracts
already-accounted tokens from the budget before applying limits.
@0xhis 0xhis force-pushed the fix/memory-compressor-token-budget branch from 22d978e to 63035db Compare March 21, 2026 08:52
@0xallam 0xallam merged commit 67050d9 into usestrix:main May 3, 2026
0xallam added a commit to b4l4-sec/strix that referenced this pull request May 4, 2026
- state.py + base_agent.py: the wake-event work is one logical unit;
  reverting state forces reverting base_agent (its only change calls
  state.wait_for_wake). The Pydantic asyncio.Event integration is
  fragile and the 500ms polling it replaces hasn't been a measured
  problem.
- memory_compressor.py: lowering the threshold from 100K to 60K is a
  workload-dependent change shipped without measurement; the token
  cache is unbounded; PR usestrix#381 has since changed this file in a
  different direction (reserved_tokens parameter), guaranteeing
  conflicts.
- executor.py: connection pooling never wires close_sandbox_client
  into sandbox teardown (resource leak). The parallel-tool execution
  has correctness risk — the sequential denylist is incomplete (file
  edits, notes, python, terminal can race) and the partition reorders
  calls, breaking LLM-intended ordering.

PR is now effectively empty; should be closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants