Skip to content

feat(gateway): support images and audio for LINE/Telegram #690

@chaodu-agent

Description

@chaodu-agent

Summary

The Custom Gateway currently only supports text messages. Add support for sending and receiving images/audio on webhook-based platforms (LINE, Telegram).

Direction Text Images Audio/Voice
Inbound (user → bot)
Outbound (bot → user)

Use Case

Users on LINE and Telegram want to send photos/screenshots to the AI agent for analysis (e.g. "what's in this image?", "review this architecture diagram", "debug this error screenshot"). Currently the gateway silently drops non-text messages — the agent never sees them. This blocks image-understanding workflows that already work on Discord.

Recommended Approach: Gateway Media Proxy

Gateway downloads media from platform APIs (which require auth), serves it at a local HTTP endpoint. OAB core fetches from that URL — same pattern as Discord CDN, zero core changes needed.

LINE/Telegram webhook (image message)
  → gateway downloads via platform API (auth required)
  → gateway stores in memory with UUID key + 2-min TTL
  → gateway serves at: GET http://gateway:8080/media/<uuid>
  → WS message to OAB: { attachments: [{ type: "image", url: "http://gateway:8080/media/<uuid>" }] }
  → OAB core downloads from gateway URL (no auth, internal network)
  → same flow as Discord CDN images through media.rs

Why this approach

  • Gateway already listens on :8080 — adding /media/<uuid> route is trivial
  • OAB core media.rs already downloads images from URLs — zero code change
  • No shared volumes, no S3, no external dependencies
  • Works for all gateway platforms (LINE, Telegram, Teams)
Implementation sketch (~50 lines in gateway)
struct MediaEntry {
    data: Vec<u8>,
    content_type: String,
    created_at: Instant,
}
type MediaStore = Arc<RwLock<HashMap<String, MediaEntry>>>;

// On inbound image:
//   1. Download from platform API (LINE Content API, Telegram getFile)
//   2. Store: media_store.insert(uuid, MediaEntry { data, content_type, now })
//   3. Send WS message with url: "http://gateway:8080/media/<uuid>"

// GET /media/<uuid> handler:
//   → return bytes + content-type
//   → 404 if expired/not found

// Background eviction: every 30s, remove entries older than 2 minutes
Why not base64 over WebSocket?
  • 33% size overhead (3MB photo → 4MB payload)
  • Large WS frames cause backpressure issues
Why not pass platform URL directly to core?
  • LINE: no public URL — requires Authorization: Bearer header with channel access token
  • Telegram: download URL contains bot token (secret leak concern)
  • Discord CDN works because URLs are public (no auth needed)
Prior art: OpenClaw media store

OpenClaw uses a local filesystem store (~/.openclaw/media/inbound/<uuid>) with 2-min TTL and media:// URI scheme. Their approach assumes co-located processes (same pod/shared volume). Our separated gateway architecture makes the HTTP proxy more appropriate, but the TTL cleanup and size-limit patterns are worth adopting.

Platform-Specific Details

Inbound

Platform Image source Auth required
LINE GET https://api-data.line.me/v2/bot/message/{id}/content Authorization: Bearer {token}
Telegram GET https://api.telegram.org/file/bot<token>/<path> Token in URL
Teams Attachment URL in activity Bearer token

Outbound (future)

Platform Method Requirement
LINE Image message type Public HTTPS URL for image
Telegram sendPhoto API Supports direct file upload
Teams Adaptive Card with image Public URL or hosted attachment

Scaling Path

Start with in-memory HashMap (sufficient for low-volume usage). If memory pressure becomes an issue, swap to temp files — same API surface, no protocol change.

Suggested Implementation Order

  1. Inbound images — highest value (user sends photo → agent sees it)
  2. Inbound audio/voice — STT pipeline (similar to Discord voice messages)
  3. Outbound images — agent sends image back to user

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions