Skip to content

Conversation

@minpeter
Copy link
Owner

@minpeter minpeter commented Jan 6, 2026

  • feat: add context management with compaction and middleware support\n\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
  • Update default model from LGAI-EXAONE/K-EXAONE-236B-A23B to zai-org/GLM-4.6 and add support for aborting ongoing conversations via ESC key
  • feat: context management follow-ups

…\n- Add context tracker for monitoring token usage\n- Implement auto-compaction when context threshold reached\n- Add middleware for trimming leading newlines in AI responses\n- Add /help command and model switching capabilities\n- Add includeUsage flag to friendliai client\n- Add debug logging for context usage
zai-org/GLM-4.6 and add support for aborting ongoing conversations via
ESC key

- Replace default model in documentation and code
- Add abort functionality to Agent class with AbortController
- Modify chat method to return aborted status
- Update command handler to support abort signals in streaming
- Add ESC key interrupt support in input handling
- Implement /context and /compact commands for monitoring and managing
  context usage
@gemini-code-assist
Copy link

Summary of Changes

Hello @minpeter, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's ability to manage conversation context, ensuring more stable and efficient long-running interactions. It introduces automatic and manual mechanisms for keeping the conversation within token limits, provides visibility into context usage, and improves user control by allowing the interruption of AI responses. These changes aim to make the agent more robust and user-friendly for extended coding sessions.

Highlights

  • Context Management System: Introduced a comprehensive context management system with ContextTracker and ContextCompactor utilities. This system monitors token usage, automatically compacts conversation history when a defined threshold is met, and provides debug logging for context usage.
  • Auto-Compaction: The agent now automatically summarizes older parts of the conversation to stay within the model's context window, preventing out-of-memory errors and improving efficiency for long interactions.
  • New Commands for Context Control: Added /context to display current token usage and compaction status, and /compact to manually trigger conversation summarization.
  • Conversation Abort Functionality: Users can now press the ESC key to interrupt an ongoing AI response, providing more control over the interaction.
  • Default Model Update: The default AI model has been updated from LGAI-EXAONE/K-EXAONE-236B-A23B to zai-org/GLM-4.6.
  • FriendliAI Client Enhancements: The FriendliAI client now includes an includeUsage flag, enabling more accurate token usage tracking directly from the API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive context management system for the agent, including token tracking, automatic context compaction, and manual controls via new commands. It also adds the ability to abort ongoing conversations with the ESC key and updates the default model. The implementation is robust, with good error handling and asynchronous control flow. I've identified a couple of areas for improvement: one to fix the logic for fallback context size estimation, and another to refactor a function for better readability. Overall, this is an excellent and significant feature addition.

Comment on lines +181 to +197
function shouldContinueAfterTools(messages: ModelMessage[]): boolean {
let lastToolIndex = -1;
for (let i = 0; i < messages.length; i += 1) {
if (messages[i]?.role === "tool") {
lastToolIndex = i;
}
}
if (lastToolIndex === -1) {
return false;
}
for (let i = lastToolIndex + 1; i < messages.length; i += 1) {
if (assistantMessageHasText(messages[i])) {
return false;
}
}
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The shouldContinueAfterTools function can be simplified for better readability and maintainability by using modern array methods like findLastIndex and some. The current implementation with for loops is correct but more verbose than necessary.

function shouldContinueAfterTools(messages: ModelMessage[]): boolean {
  const lastToolIndex = messages.findLastIndex((msg) => msg.role === "tool");

  if (lastToolIndex === -1) {
    return false;
  }

  // Check if there is any assistant message with text after the last tool message.
  const subsequentMessages = messages.slice(lastToolIndex + 1);
  return !subsequentMessages.some(assistantMessageHasText);
}

Comment on lines +22 to +118
export class ContextTracker {
private readonly config: ContextConfig;
private totalInputTokens = 0;
private totalOutputTokens = 0;
private stepCount = 0;
private currentContextTokens: number | null = null;

constructor(config: Partial<ContextConfig> = {}) {
this.config = { ...DEFAULT_CONFIG, ...config };
}

setMaxContextTokens(tokens: number): void {
this.config.maxContextTokens = tokens;
}

setCompactionThreshold(threshold: number): void {
if (threshold < 0 || threshold > 1) {
throw new Error("Compaction threshold must be between 0 and 1");
}
this.config.compactionThreshold = threshold;
}

updateUsage(usage: LanguageModelUsage): void {
this.totalInputTokens += usage.inputTokens ?? 0;
this.totalOutputTokens += usage.outputTokens ?? 0;
this.stepCount++;
}

/**
* Set the exact current context token count.
*/
setContextTokens(tokens: number): void {
this.currentContextTokens = Math.max(0, Math.round(tokens));
}

/**
* Set total usage directly (useful after compaction or when loading state)
*/
setTotalUsage(inputTokens: number, outputTokens: number): void {
this.totalInputTokens = inputTokens;
this.totalOutputTokens = outputTokens;
}

/**
* Get estimated current context size
* Note: This is an approximation based on accumulated usage
*/
getEstimatedContextTokens(): number {
// The input tokens from the last request roughly represents
// the current context size (system prompt + conversation history)
return this.totalInputTokens > 0
? Math.round(this.totalInputTokens / Math.max(this.stepCount, 1))
: 0;
}

getStats(): ContextStats {
const totalTokens =
this.currentContextTokens ?? this.getEstimatedContextTokens();
const usagePercentage = totalTokens / this.config.maxContextTokens;
const shouldCompact = usagePercentage >= this.config.compactionThreshold;

return {
totalTokens,
inputTokens: this.totalInputTokens,
outputTokens: this.totalOutputTokens,
maxContextTokens: this.config.maxContextTokens,
usagePercentage,
shouldCompact,
};
}

shouldCompact(): boolean {
return this.getStats().shouldCompact;
}

reset(): void {
this.totalInputTokens = 0;
this.totalOutputTokens = 0;
this.stepCount = 0;
this.currentContextTokens = 0;
}

/**
* Called after compaction to adjust token counts
* @param newInputTokens The token count of the compacted context
*/
afterCompaction(newInputTokens: number): void {
this.totalInputTokens = newInputTokens;
this.totalOutputTokens = 0;
this.stepCount = 1;
this.currentContextTokens = Math.max(0, Math.round(newInputTokens));
}

getConfig(): ContextConfig {
return { ...this.config };
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of getEstimatedContextTokens calculates the average input tokens per step, which doesn't accurately reflect the current context size as the conversation grows. The comment for the function correctly states that the input tokens from the last request are a better estimate. This can be fixed by storing the last input token count in the tracker. This change improves the accuracy of the fallback context size estimation, making the tracker more robust when the precise token measurement API fails.

export class ContextTracker {
  private readonly config: ContextConfig;
  private totalInputTokens = 0;
  private totalOutputTokens = 0;
  private stepCount = 0;
  private currentContextTokens: number | null = null;
  private lastInputTokens = 0;

  constructor(config: Partial<ContextConfig> = {}) {
    this.config = { ...DEFAULT_CONFIG, ...config };
  }

  setMaxContextTokens(tokens: number): void {
    this.config.maxContextTokens = tokens;
  }

  setCompactionThreshold(threshold: number): void {
    if (threshold < 0 || threshold > 1) {
      throw new Error("Compaction threshold must be between 0 and 1");
    }
    this.config.compactionThreshold = threshold;
  }

  updateUsage(usage: LanguageModelUsage): void {
    this.totalInputTokens += usage.inputTokens ?? 0;
    this.totalOutputTokens += usage.outputTokens ?? 0;
    this.stepCount++;
    this.lastInputTokens = usage.inputTokens ?? 0;
  }

  /**
   * Set the exact current context token count.
   */
  setContextTokens(tokens: number): void {
    this.currentContextTokens = Math.max(0, Math.round(tokens));
  }

  /**
   * Set total usage directly (useful after compaction or when loading state)
   */
  setTotalUsage(inputTokens: number, outputTokens: number): void {
    this.totalInputTokens = inputTokens;
    this.totalOutputTokens = outputTokens;
  }

  /**
   * Get estimated current context size
   * Note: This is an approximation based on accumulated usage
   */
  getEstimatedContextTokens(): number {
    // The input tokens from the last request roughly represents
    // the current context size (system prompt + conversation history)
    return this.lastInputTokens;
  }

  getStats(): ContextStats {
    const totalTokens =
      this.currentContextTokens ?? this.getEstimatedContextTokens();
    const usagePercentage = totalTokens / this.config.maxContextTokens;
    const shouldCompact = usagePercentage >= this.config.compactionThreshold;

    return {
      totalTokens,
      inputTokens: this.totalInputTokens,
      outputTokens: this.totalOutputTokens,
      maxContextTokens: this.config.maxContextTokens,
      usagePercentage,
      shouldCompact,
    };
  }

  shouldCompact(): boolean {
    return this.getStats().shouldCompact;
  }

  reset(): void {
    this.totalInputTokens = 0;
    this.totalOutputTokens = 0;
    this.stepCount = 0;
    this.currentContextTokens = 0;
    this.lastInputTokens = 0;
  }

  /**
   * Called after compaction to adjust token counts
   * @param newInputTokens The token count of the compacted context
   */
  afterCompaction(newInputTokens: number): void {
    this.totalInputTokens = newInputTokens;
    this.totalOutputTokens = 0;
    this.stepCount = 1;
    this.currentContextTokens = Math.max(0, Math.round(newInputTokens));
    this.lastInputTokens = newInputTokens;
  }

  getConfig(): ContextConfig {
    return { ...this.config };
  }
}

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6476bf824d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +53 to 56
const onLine = (line: string) => {
rl.removeListener("close", onClose);
rl.pause();
resolve(line);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resume stdin before streaming to enable ESC abort

The new ESC abort handler depends on emitKeypressEvents(process.stdin), but readline.pause() also calls process.stdin.pause(), which stops data events (and thus keypress). Because stdin is paused after each line and never resumed before agent.chat, pressing ESC during streaming won't trigger agent.abort(), so the interrupt feature silently fails. Consider resuming stdin (or skipping the pause) before starting the streaming chat loop.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants