-
Notifications
You must be signed in to change notification settings - Fork 0
context mgmt #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
context mgmt #1
Conversation
minpeter
commented
Jan 6, 2026
- feat: add context management with compaction and middleware support\n\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
- Update default model from LGAI-EXAONE/K-EXAONE-236B-A23B to zai-org/GLM-4.6 and add support for aborting ongoing conversations via ESC key
- feat: context management follow-ups
…\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
zai-org/GLM-4.6 and add support for aborting ongoing conversations via ESC key - Replace default model in documentation and code - Add abort functionality to Agent class with AbortController - Modify chat method to return aborted status - Update command handler to support abort signals in streaming - Add ESC key interrupt support in input handling - Implement /context and /compact commands for monitoring and managing context usage
Summary of ChangesHello @minpeter, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent's ability to manage conversation context, ensuring more stable and efficient long-running interactions. It introduces automatic and manual mechanisms for keeping the conversation within token limits, provides visibility into context usage, and improves user control by allowing the interruption of AI responses. These changes aim to make the agent more robust and user-friendly for extended coding sessions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive context management system for the agent, including token tracking, automatic context compaction, and manual controls via new commands. It also adds the ability to abort ongoing conversations with the ESC key and updates the default model. The implementation is robust, with good error handling and asynchronous control flow. I've identified a couple of areas for improvement: one to fix the logic for fallback context size estimation, and another to refactor a function for better readability. Overall, this is an excellent and significant feature addition.
| function shouldContinueAfterTools(messages: ModelMessage[]): boolean { | ||
| let lastToolIndex = -1; | ||
| for (let i = 0; i < messages.length; i += 1) { | ||
| if (messages[i]?.role === "tool") { | ||
| lastToolIndex = i; | ||
| } | ||
| } | ||
| if (lastToolIndex === -1) { | ||
| return false; | ||
| } | ||
| for (let i = lastToolIndex + 1; i < messages.length; i += 1) { | ||
| if (assistantMessageHasText(messages[i])) { | ||
| return false; | ||
| } | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shouldContinueAfterTools function can be simplified for better readability and maintainability by using modern array methods like findLastIndex and some. The current implementation with for loops is correct but more verbose than necessary.
function shouldContinueAfterTools(messages: ModelMessage[]): boolean {
const lastToolIndex = messages.findLastIndex((msg) => msg.role === "tool");
if (lastToolIndex === -1) {
return false;
}
// Check if there is any assistant message with text after the last tool message.
const subsequentMessages = messages.slice(lastToolIndex + 1);
return !subsequentMessages.some(assistantMessageHasText);
}| export class ContextTracker { | ||
| private readonly config: ContextConfig; | ||
| private totalInputTokens = 0; | ||
| private totalOutputTokens = 0; | ||
| private stepCount = 0; | ||
| private currentContextTokens: number | null = null; | ||
|
|
||
| constructor(config: Partial<ContextConfig> = {}) { | ||
| this.config = { ...DEFAULT_CONFIG, ...config }; | ||
| } | ||
|
|
||
| setMaxContextTokens(tokens: number): void { | ||
| this.config.maxContextTokens = tokens; | ||
| } | ||
|
|
||
| setCompactionThreshold(threshold: number): void { | ||
| if (threshold < 0 || threshold > 1) { | ||
| throw new Error("Compaction threshold must be between 0 and 1"); | ||
| } | ||
| this.config.compactionThreshold = threshold; | ||
| } | ||
|
|
||
| updateUsage(usage: LanguageModelUsage): void { | ||
| this.totalInputTokens += usage.inputTokens ?? 0; | ||
| this.totalOutputTokens += usage.outputTokens ?? 0; | ||
| this.stepCount++; | ||
| } | ||
|
|
||
| /** | ||
| * Set the exact current context token count. | ||
| */ | ||
| setContextTokens(tokens: number): void { | ||
| this.currentContextTokens = Math.max(0, Math.round(tokens)); | ||
| } | ||
|
|
||
| /** | ||
| * Set total usage directly (useful after compaction or when loading state) | ||
| */ | ||
| setTotalUsage(inputTokens: number, outputTokens: number): void { | ||
| this.totalInputTokens = inputTokens; | ||
| this.totalOutputTokens = outputTokens; | ||
| } | ||
|
|
||
| /** | ||
| * Get estimated current context size | ||
| * Note: This is an approximation based on accumulated usage | ||
| */ | ||
| getEstimatedContextTokens(): number { | ||
| // The input tokens from the last request roughly represents | ||
| // the current context size (system prompt + conversation history) | ||
| return this.totalInputTokens > 0 | ||
| ? Math.round(this.totalInputTokens / Math.max(this.stepCount, 1)) | ||
| : 0; | ||
| } | ||
|
|
||
| getStats(): ContextStats { | ||
| const totalTokens = | ||
| this.currentContextTokens ?? this.getEstimatedContextTokens(); | ||
| const usagePercentage = totalTokens / this.config.maxContextTokens; | ||
| const shouldCompact = usagePercentage >= this.config.compactionThreshold; | ||
|
|
||
| return { | ||
| totalTokens, | ||
| inputTokens: this.totalInputTokens, | ||
| outputTokens: this.totalOutputTokens, | ||
| maxContextTokens: this.config.maxContextTokens, | ||
| usagePercentage, | ||
| shouldCompact, | ||
| }; | ||
| } | ||
|
|
||
| shouldCompact(): boolean { | ||
| return this.getStats().shouldCompact; | ||
| } | ||
|
|
||
| reset(): void { | ||
| this.totalInputTokens = 0; | ||
| this.totalOutputTokens = 0; | ||
| this.stepCount = 0; | ||
| this.currentContextTokens = 0; | ||
| } | ||
|
|
||
| /** | ||
| * Called after compaction to adjust token counts | ||
| * @param newInputTokens The token count of the compacted context | ||
| */ | ||
| afterCompaction(newInputTokens: number): void { | ||
| this.totalInputTokens = newInputTokens; | ||
| this.totalOutputTokens = 0; | ||
| this.stepCount = 1; | ||
| this.currentContextTokens = Math.max(0, Math.round(newInputTokens)); | ||
| } | ||
|
|
||
| getConfig(): ContextConfig { | ||
| return { ...this.config }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of getEstimatedContextTokens calculates the average input tokens per step, which doesn't accurately reflect the current context size as the conversation grows. The comment for the function correctly states that the input tokens from the last request are a better estimate. This can be fixed by storing the last input token count in the tracker. This change improves the accuracy of the fallback context size estimation, making the tracker more robust when the precise token measurement API fails.
export class ContextTracker {
private readonly config: ContextConfig;
private totalInputTokens = 0;
private totalOutputTokens = 0;
private stepCount = 0;
private currentContextTokens: number | null = null;
private lastInputTokens = 0;
constructor(config: Partial<ContextConfig> = {}) {
this.config = { ...DEFAULT_CONFIG, ...config };
}
setMaxContextTokens(tokens: number): void {
this.config.maxContextTokens = tokens;
}
setCompactionThreshold(threshold: number): void {
if (threshold < 0 || threshold > 1) {
throw new Error("Compaction threshold must be between 0 and 1");
}
this.config.compactionThreshold = threshold;
}
updateUsage(usage: LanguageModelUsage): void {
this.totalInputTokens += usage.inputTokens ?? 0;
this.totalOutputTokens += usage.outputTokens ?? 0;
this.stepCount++;
this.lastInputTokens = usage.inputTokens ?? 0;
}
/**
* Set the exact current context token count.
*/
setContextTokens(tokens: number): void {
this.currentContextTokens = Math.max(0, Math.round(tokens));
}
/**
* Set total usage directly (useful after compaction or when loading state)
*/
setTotalUsage(inputTokens: number, outputTokens: number): void {
this.totalInputTokens = inputTokens;
this.totalOutputTokens = outputTokens;
}
/**
* Get estimated current context size
* Note: This is an approximation based on accumulated usage
*/
getEstimatedContextTokens(): number {
// The input tokens from the last request roughly represents
// the current context size (system prompt + conversation history)
return this.lastInputTokens;
}
getStats(): ContextStats {
const totalTokens =
this.currentContextTokens ?? this.getEstimatedContextTokens();
const usagePercentage = totalTokens / this.config.maxContextTokens;
const shouldCompact = usagePercentage >= this.config.compactionThreshold;
return {
totalTokens,
inputTokens: this.totalInputTokens,
outputTokens: this.totalOutputTokens,
maxContextTokens: this.config.maxContextTokens,
usagePercentage,
shouldCompact,
};
}
shouldCompact(): boolean {
return this.getStats().shouldCompact;
}
reset(): void {
this.totalInputTokens = 0;
this.totalOutputTokens = 0;
this.stepCount = 0;
this.currentContextTokens = 0;
this.lastInputTokens = 0;
}
/**
* Called after compaction to adjust token counts
* @param newInputTokens The token count of the compacted context
*/
afterCompaction(newInputTokens: number): void {
this.totalInputTokens = newInputTokens;
this.totalOutputTokens = 0;
this.stepCount = 1;
this.currentContextTokens = Math.max(0, Math.round(newInputTokens));
this.lastInputTokens = newInputTokens;
}
getConfig(): ContextConfig {
return { ...this.config };
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6476bf824d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const onLine = (line: string) => { | ||
| rl.removeListener("close", onClose); | ||
| rl.pause(); | ||
| resolve(line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resume stdin before streaming to enable ESC abort
The new ESC abort handler depends on emitKeypressEvents(process.stdin), but readline.pause() also calls process.stdin.pause(), which stops data events (and thus keypress). Because stdin is paused after each line and never resumed before agent.chat, pressing ESC during streaming won't trigger agent.abort(), so the interrupt feature silently fails. Consider resuming stdin (or skipping the pause) before starting the streaming chat loop.
Useful? React with 👍 / 👎.