Implemented generic multimodal chat handler. by alcoftTAO · Pull Request #125 · JamePeng/llama-cpp-python

alcoftTAO · 2026-05-04T19:19:44Z

Implemented a generic/global multimodal chat handler.

What does it do?

It automatically uses the model's chat template and replaces all of the model's multimodal tags with the media_marker tag.

This allows a much easier implementation for multimodal models, since the chat template doesn't need to be hard-coded for each model.

How to use it?

It is as simple as passing the clip_model_path parameter to the Llama class when created.

Note

Using the previous implementation (e.g. Qwen35ChatHandler) still works.

I'm also looking forward to implement more model architectures. Please, reply if you want me to implement any.

JamePeng · 2026-05-05T21:30:48Z

You can take a look at how to improve the injection process. #110

JamePeng · 2026-05-13T16:43:04Z

It seems there's no work on how to perform URL injection for multimedia; simply replacing it with a media marker isn't enough.

This code also needs to be removed:

if hasattr(llama, 'input_ids'):
    llama.input_ids.fill(0)

Architecture-based tag guessing should not default unknown models to Qwen-style tags. Prefer detecting media tags from the actual chat template, or better, avoid tag guessing by normalizing OpenAI content parts into placeholders before rendering.

KNOWN_MEDIA_TAGS = [
    "<|image_pad|>",
    "<|audio_pad|>",
    "<|video_pad|>",
    "<|image|>",
    "<|audio|>",
    "<|video|>",
    "[IMG]",
]

and

self._chat_format_parser_tags = [
    tag for tag in KNOWN_MEDIA_TAGS
    if tag in self.chat_format
]

In addition, a check is needed to ensure that the number of replacement markers matches the number of incoming media.

alcoftTAO · 2026-05-16T04:45:56Z

@JamePeng What do you think of this code?

JamePeng · 2026-05-16T12:34:21Z

You can test the multimodal usage of qwen3vl, qwen3.5/3.6, and gemma4.
In particular, check if the omni function of gemma4 is affected.

Signed-off-by: JamePeng <jame_peng@sina.com>

- Add a PowerShell step to the Windows CI workflow to locate and copy `libomp140.x86_64.dll` from the Visual Studio redistributables. - Place the runtime DLL into the `llama_cpp\lib` package directory. This ensures that the dynamically loaded `ggml-cpu-*.dll` variants (which are built with LLVM OpenMP on Windows) have their required dependencies packaged in the wheel. Without this, `ggml_backend_load_all_from_path()` can silently fail to load the CPU backends at runtime on end-user machines. Signed-off-by: JamePeng <jame_peng@sina.com>

JamePeng · 2026-05-27T14:54:50Z

@alcoftTAO Hello, could you resolve some file conflicts? It seems like adding unrelated files...

alcoftTAO · 2026-05-27T22:32:45Z

@alcoftTAO Hello, could you resolve some file conflicts? It seems like adding unrelated files...

I'm working on this.

alcoftTAO · 2026-05-27T23:02:28Z

I think it's fixed. Please let me know if anything is wrong.

alcoftTAO · 2026-05-27T23:55:13Z

You can test the multimodal usage of qwen3vl, qwen3.5/3.6, and gemma4. In particular, check if the omni function of gemma4 is affected.

Both image and audio capabilities work as expected.

JamePeng · 2026-05-28T00:55:37Z

It's suggested that clip_model_path: Optional[str] = None be uniformly changed to mmproj_path: Optional[str] = None, because it's no longer just mtmd that's used for simple clipping.

It's also worth considering adding the chat_template_override: Optional[str] = None feature. I've tested it with embedding or rerank models that don't have a chat template, allowing users to easily write their own based on their token ID and pass in a temporary chat template to achieve this functionality.

Alternatively, it's unnecessary to pass in the entire metadata. You can pass in the pre-processed template_choices at the end of llama.init to speed up template retrieval, or use the chat template wrapper I added to LlamaModel last night, with name=None to retrieve the default template. Both methods can speed up and reduce unnecessary memory transfers and throughput, thus improving performance.

alcoftTAO added 4 commits May 4, 2026 20:58

Implemented generic multimodal chat handler.

1f5226b

Used text.replace()

a8d19d3

Fixed some bugs.

3e031d5

Implemented 'chat_handler_kwargs'.

389d0d9

alcoftTAO marked this pull request as draft May 14, 2026 15:19

fix

9187910

JamePeng force-pushed the main branch 8 times, most recently from e1caafb to 628373c Compare May 16, 2026 12:09

JamePeng and others added 3 commits May 19, 2026 19:25

Update Submodule vendor/llama.cpp 39cf5d6..6db1304

b48d57a

Signed-off-by: JamePeng <jame_peng@sina.com>

Merge branch 'JamePeng:main' into mtmd

4d9af07

JamePeng force-pushed the main branch 2 times, most recently from 78ead7c to 615e45a Compare May 23, 2026 05:58

Resolve file conflicts.

677db7b

Fixed merge conflicts.

3208891

Added support when using the keyword 'audio' instead of 'audio_url'.

4794c8c

alcoftTAO marked this pull request as ready for review May 27, 2026 23:55

JamePeng force-pushed the main branch from 7410c5a to 103639c Compare May 29, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented generic multimodal chat handler.#125

Implemented generic multimodal chat handler.#125
alcoftTAO wants to merge 11 commits into
JamePeng:mainfrom
TAO71-AI:mtmd

alcoftTAO commented May 4, 2026 •

edited

Loading

Uh oh!

JamePeng commented May 5, 2026

Uh oh!

JamePeng commented May 13, 2026 •

edited

Loading

Uh oh!

alcoftTAO commented May 16, 2026

Uh oh!

JamePeng commented May 16, 2026

Uh oh!

JamePeng commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

JamePeng commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alcoftTAO commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does it do?

How to use it?

Uh oh!

JamePeng commented May 5, 2026

Uh oh!

JamePeng commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alcoftTAO commented May 16, 2026

Uh oh!

JamePeng commented May 16, 2026

Uh oh!

JamePeng commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

alcoftTAO commented May 27, 2026

Uh oh!

JamePeng commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alcoftTAO commented May 4, 2026 •

edited

Loading

JamePeng commented May 13, 2026 •

edited

Loading

JamePeng commented May 28, 2026 •

edited

Loading