Skip to content

Fix multi-sequence embeddings#2058

Open
iamlemec wants to merge 1 commit into
abetlen:mainfrom
iamlemec:fix-batch-embed
Open

Fix multi-sequence embeddings#2058
iamlemec wants to merge 1 commit into
abetlen:mainfrom
iamlemec:fix-batch-embed

Conversation

@iamlemec
Copy link
Copy Markdown
Contributor

Fixes multi-sequence (batch) embeddings by handling n_seq_max and kv_unified flags. See discussion in #2051.

@LimePencil
Copy link
Copy Markdown

@abetlen any updates yet?

@freckletonj
Copy link
Copy Markdown

confirming this is still an issue

@mlisovyi
Copy link
Copy Markdown

mlisovyi commented May 5, 2026

Shouldn't n_seq_max be also used in Llama.embed() ? One should add or p_batch == n_seq_max to the batch-evaluation condition in the loop here. Otherwise one runs into a danger of collecting a batch that will consist of more sequences as the configured maximum (if individual inputs are short or the configured n_seq_max is small) and this will also lead to the same llama_decode returned -1 error

@mlisovyi
Copy link
Copy Markdown

mlisovyi commented May 5, 2026

Also, would it make sense to expose those parameters in ModelSettings with some meiningful defaults to allow setting them in the server run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants