Skip to content

server: preserve context checkpoint coverage#22826

Open
jacekpoplawski wants to merge 1 commit into
ggml-org:masterfrom
jacekpoplawski:checkpoint-coverage
Open

server: preserve context checkpoint coverage#22826
jacekpoplawski wants to merge 1 commit into
ggml-org:masterfrom
jacekpoplawski:checkpoint-coverage

Conversation

@jacekpoplawski
Copy link
Copy Markdown
Contributor

Instead of always removing the oldest context checkpoint when the checkpoint limit is reached, remove the checkpoint that appears most redundant based on the distance between its neighbors.

Overview

This is my attempt to fix forcing full prompt re-processing due to lack of cache data

This changes the checkpoint removal policy: when the limit is reached, it removes an interior checkpoint whose neighboring checkpoints are closest together.

Additional information

I use the following arguments: --ctx-checkpoints 24 --checkpoint-every-n-tokens 8192 --cache-ram 65536

After just a few prompts in a pi coding agent, I see:

slot launch_slot_: id  0 | task 6130 | processing task, is_child = 0
slot update_slots: id  0 | task 6130 | new prompt, n_ctx_slot = 200192, n_keep = 4096, task.n_tokens = 31544
slot update_slots: id  0 | task 6130 | n_past = 3579, slot.prompt.tokens.size() = 33081, seq_id = 0, pos_min = 33080, n_swa = 0
slot update_slots: id  0 | task 6130 | Checking checkpoint with [32656, 32656] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [32531, 32531] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [32436, 32436] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [32361, 32361] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [31837, 31837] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [31325, 31325] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [30750, 30750] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [30660, 30660] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [30473, 30473] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [30371, 30371] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [30008, 30008] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [29496, 29496] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [29027, 29027] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [28942, 28942] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [28830, 28830] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [28278, 28278] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [27757, 27757] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [27188, 27188] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [26676, 26676] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [23367, 23367] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [23187, 23187] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [21341, 21341] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [20918, 20918] against 3579...
slot update_slots: id  0 | task 6130 | Checking checkpoint with [20479, 20479] against 3579...
slot update_slots: id  0 | task 6130 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)

the server needed a checkpoint around n_past = 3579, but all available checkpoints were much later, from 20479 to 32656, causing full prompt re-processing.

The root cause seems to be that checkpoints are not only created at the --checkpoint-every-n-tokens interval. Additional checkpoints can be created near prompt/request boundaries, and with the previous FIFO removal policy these dense recent checkpoints can erase older checkpoints.

I first tried disabling the additional checkpoint creation, but that did not work well.

I tested this change with --ctx-checkpoints 8 to trigger checkpoint removal sooner and I could not reproduce the forcing full prompt re-processing due to lack of cache data

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - initial research and final code polish

Instead of always removing the oldest context checkpoint,
remove the one that appears most redundant based on the distance between its neighbors.
@jacekpoplawski jacekpoplawski requested a review from a team as a code owner May 8, 2026 00:53
@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented May 8, 2026

Hi @jacekpoplawski, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@ggerganov
Copy link
Copy Markdown
Member

The idea is OK, but it is still a "poor-man" solution. The most optimal way to do the checkpoints is to leverage the changes in #21885 and take into account the structure of the conversation.

@jacekpoplawski
Copy link
Copy Markdown
Contributor Author

The idea is OK, but it is still a "poor-man" solution. The most optimal way to do the checkpoints is to leverage the changes in #21885 and take into account the structure of the conversation.

If I understand correctly, #21885 would tell us where the important positions are, and checkpoint removal should prefer keeping checkpoints around those positions. Or do you mean that this information should be used when creating checkpoints instead?

@ggerganov
Copy link
Copy Markdown
Member

Or do you mean that this information should be used when creating checkpoints instead?

Yes, the information should be used for creating the checkpoints right before user inputs.

@jacekpoplawski
Copy link
Copy Markdown
Contributor Author

Or do you mean that this information should be used when creating checkpoints instead?

Yes, the information should be used for creating the checkpoints right before user inputs.

#22929

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants