End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering #1316

SignalRT · 2026-01-19T22:48:14Z

Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.

Key Features:

Streaming chat responses rendered incrementally.
Markdown rendering in the UI (including code blocks, lists, etc.).
Multimodal inference pipeline with MTMD support wired into session execution.
PDF ingestion with text extraction and truncation safeguards.
Word (DOCX) ingestion with text extraction from document XML.
Image uploads supported end‑to‑end (validation, storage, rendering in chat).
Audio uploads supported end‑to‑end (validation, storage, playback in chat).
In‑browser audio recording (MediaRecorder) with preview + attach/discard workflow.
Capability‑aware UI (shows whether text/vision/audio are supported per model).
Download models automatically and shows the progress

Implementation Highlights

Attachment service handles file validation, storage, and extraction (PDF/DOCX).
Model session builds prompts with attached media and enforces capability checks.
Chat UI renders images/audio and guides users on supported inputs.
Captures audio and converts it to a browser file for existing upload flow.
Streaming tokens update the UI while Markdown is rendered on the fly.

Capability to upload images and ask about the images

Model auto-download + Capability to upload files and ask about the files

Initial version

SignalRT added 2 commits January 19, 2026 23:45

Improve LLama.Web

6859e57

Initial version

Add Missing Files

466a8cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering #1316

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering #1316

Uh oh!

SignalRT commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering #1316

Are you sure you want to change the base?

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering #1316

Uh oh!

Conversation

SignalRT commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SignalRT commented Jan 19, 2026 •

edited

Loading