Skip to content

Fix infinite loop in _chunk_text when overlap >= chunk_size#2124

Open
Jay-ju wants to merge 1 commit into
volcengine:mainfrom
Jay-ju:fix/chunk-text-infinite-loop
Open

Fix infinite loop in _chunk_text when overlap >= chunk_size#2124
Jay-ju wants to merge 1 commit into
volcengine:mainfrom
Jay-ju:fix/chunk-text-infinite-loop

Conversation

@Jay-ju
Copy link
Copy Markdown
Contributor

@Jay-ju Jay-ju commented May 19, 2026

The _chunk_text method in SessionCompressor enters an infinite loop when overlap >= chunk_size because start = end - overlap results in start <= previous_start, so the loop never advances.

Reproduction:
_chunk_text('A' * 5000, chunk_size=2000, overlap=2000)
-> hangs forever (CPU spins at 100%)

This can be triggered by misconfiguration (e.g. setting memory_chunk_overlap >= memory_chunk_chars in semantic config).

Fix: After computing next_start = end - overlap, check if it would not advance past the current start. If so, fall back to next_start = end (no overlap for this iteration), ensuring the loop always makes forward progress.

Description

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

The _chunk_text method in SessionCompressor enters an infinite loop
when overlap >= chunk_size because start = end - overlap results in
start <= previous_start, so the loop never advances.

Reproduction:
  _chunk_text('A' * 5000, chunk_size=2000, overlap=2000)
  -> hangs forever (CPU spins at 100%)

This can be triggered by misconfiguration (e.g. setting
memory_chunk_overlap >= memory_chunk_chars in semantic config).

Fix: After computing next_start = end - overlap, check if it would
not advance past the current start. If so, fall back to next_start
= end (no overlap for this iteration), ensuring the loop always
makes forward progress.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🏅 Score: 90
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant