Problem Statement
Mux users can ask agents to design or generate visual assets, but Mux does not currently provide a first-class Image Generation Tool. Agents can discuss image prompts or improvise external workflows, but there is no integrated way to use a settings-configured Image Generation Model, save generated artifacts in the active runtime, preview them in the transcript, account for provider usage, or guide agents through safe artifact handling.
The user wants image generation to be explicitly experimental, configurable in Settings, and robust enough that generated images are visible as first-class transcript artifacts rather than generic tool JSON. The feature should avoid polluting a project workspace with every generated preview while still making it easy for agents to copy selected final assets into the workspace.
Solution
Add an experimental, Mux-executed image_generate tool backed by OpenAI image models through the AI SDK image generation API. The experiment is default-off and configurable from Settings. It uses a nested Image Generation Configuration with a default Image Generation Model of openai:gpt-image-2 and a configurable maximum image count per call.
When an Exec-mode agent successfully calls image_generate, Mux saves full-resolution Image Generation Artifacts under the active runtime's temporary directory, stores bounded thumbnails in the tool result, and renders a first-class Generated Image Display Message in the transcript. The normal persisted transcript source of truth remains the tool call/result, but successful image generations are displayed as generated-image messages instead of generic tool rows.
Ship a richer built-in /imagegen Agent Skill that teaches when and how to use the tool, prompt construction patterns, iteration guidance, and artifact policy. Add concise user documentation for the experimental Image Generation Model setting. Preserve policy enforcement, usage reporting, and structured failure behavior so image generation is not a provider-policy or cost-visibility escape hatch.
User Stories
- As a Mux user, I want to opt into image generation explicitly, so that image generation does not appear or spend money until I enable the experiment.
- As a Mux user, I want to configure the Image Generation Model in Settings, so that generated images use the model I expect.
- As a Mux user, I want
openai:gpt-image-2 to be the default Image Generation Model, so that the experiment starts with OpenAI's current GPT Image 2 model.
- As a Mux user, I want to pin
openai:gpt-image-2-2026-04-21 if needed, so that I can get stable image-model behavior over time.
- As a Mux user, I want to configure the maximum number of images per call, so that I can balance cost safety against power-user experimentation.
- As a Mux user, I want the default image-count limit to be conservative, so that accidental multi-image calls do not create surprising cost.
- As a power user, I want to raise the image-count limit up to the provider-supported range, so that I can generate more variants in one call.
- As a Mux user, I want requests above my configured image-count limit to fail clearly, so that the agent does not silently produce fewer images than requested.
- As a Mux user, I want image generation available to Exec-mode agents by default, so that artifact-producing work happens in the mode intended for doing work.
- As a Mux user, I want Plan and Explore agents not to generate images by default, so that planning and read-only exploration do not perform costful side effects.
- As a Mux user, I want the tool to use its own configured Image Generation Model rather than the current chat model, so that image generation works even when I am chatting with a non-OpenAI model.
- As a Mux user, I want the generated image to appear as a first-class transcript item, so that I can see and inspect outputs without opening raw JSON.
- As a Mux user, I want generated image thumbnails to render in the transcript, so that I can quickly judge which outputs are useful.
- As a Mux user, I want full-resolution generated files saved somewhere accessible to tools, so that agents can copy final images into the workspace.
- As a Mux user, I want generated previews not to automatically dirty my git workspace, so that discarded variants do not pollute source control.
- As a Mux user, I want agents to copy selected final images into the workspace explicitly, so that project-bound assets are intentional.
- As a Mux user, I want the transcript to remain visually useful even if runtime-temp artifacts later disappear, so that historical chats still show bounded previews.
- As a Mux user, I want setup problems to appear as structured tool failures, so that the agent can explain how to fix missing credentials or invalid settings.
- As a Mux user, I want missing OpenAI credentials to fail inside the image tool rather than blocking the entire chat send, so that optional image generation does not break unrelated conversation.
- As a Mux user, I want disabled providers to produce actionable tool failures, so that I understand why image generation cannot run.
- As a Mux user, I want unsupported non-OpenAI Image Generation Models to fail clearly in v1, so that I know the current provider scope.
- As a Mux user, I want OpenAI API quota, moderation, or model errors to be surfaced clearly, so that I can correct the request or provider setup.
- As an enterprise admin, I want provider/model policy enforced for image generation, so that image generation cannot bypass organization policy.
- As an enterprise admin, I want image generation usage to be reported through existing usage infrastructure when available, so that image-related provider usage is visible.
- As a Mux user, I want image generation not to fail solely because usage reporting failed, so that auxiliary accounting problems do not discard successful outputs.
- As a Mux user, I want prompt, image count, quality, and output format controls, so that common generation needs are supported without exposing too many provider-specific knobs.
- As a Mux user, I want quality to support OpenAI's image quality levels, so that I can trade off speed/cost/quality when needed.
- As a Mux user, I want output format to support common raster formats, so that generated assets fit common project needs.
- As a Mux user, I want unsupported controls such as seed, aspect ratio, style, background, moderation overrides, compression, edits, masks, and batch generation deferred, so that v1 remains reliable and scoped.
- As an agent, I want a built-in
/imagegen skill, so that I know when and how to use the Image Generation Tool.
- As an agent, I want the image generation skill to include prompting recipes, so that I can produce better prompts without over-authoring user requests.
- As an agent, I want the image generation skill to tell me the v1 scope, so that I do not claim to edit images, use masks, or create transparent-background workflows before those capabilities exist.
- As an agent, I want artifact-handling policy in the skill, so that I copy selected final assets into the workspace only when appropriate.
- As a Mux user, I want generated images to show their saved paths, so that I can find and reuse the full-resolution files.
- As a Mux user, I want a generated-image display row to replace the generic successful tool row, so that the transcript emphasizes the artifact rather than internal tool mechanics.
- As a Mux user, I want failed, pending, executing, interrupted, or redacted image-generation calls to keep using normal tool rows, so that status and errors stay consistent with other tools.
- As a Mux user, I want image-generation settings documented, so that I can discover how to enable and configure the experiment.
- As a maintainer, I want the persisted transcript source of truth to stay as normal tool parts for v1, so that replay, retries, and downgrade behavior remain compatible.
- As a maintainer, I want a derived Generated Image Display Message rather than a new persisted message type in v1, so that the UX can be first-class without a schema migration.
- As a maintainer, I want thumbnail generation failures to degrade gracefully, so that a previewing problem does not discard valid full-resolution images.
- As a maintainer, I want artifact paths to be sanitized and deterministic enough for tool-call outputs, so that generated files are saved safely.
- As a maintainer, I want image-generation configuration validation to be centralized and testable, so that UI, tool execution, and future providers share the same rules.
- As a maintainer, I want provider/model setup validation to be encapsulated, so that image generation does not duplicate chat-model creation logic unsafely.
- As a reviewer, I want dogfooding steps with screenshots and a recording, so that I can verify the visual behavior that automated tests cannot fully cover.
- As a future implementer, I want the architecture captured in an ADR and PRD, so that I can verify the final implementation against the intended scope.
Implementation Decisions
- Build an Image Generation Tool as a Mux-executed model-callable tool named
image_generate.
- Use OpenAI-only provider scope for v1 while preserving the standard
provider:model-id shape for Image Generation Model configuration.
- Use the AI SDK image generation API and OpenAI image model factory rather than hand-written HTTP calls or the hosted Responses image-generation tool.
- Add a default-off, visible Image Generation Tool experiment with inline experiment configuration.
- Store Image Generation Configuration as an app-level nested object containing the configured Image Generation Model and maximum images per call.
- Default the Image Generation Model to
openai:gpt-image-2 and support the pinned snapshot openai:gpt-image-2-2026-04-21 as a user-entered override.
- Default the maximum image count per call to 4 with an allowed range of 1 through 10.
- Reject requests exceeding the configured count limit with a structured tool failure; do not silently clamp.
- Expose only prompt, image count, quality, and output format in the first tool schema.
- Defer seed, aspect ratio, style, background, moderation overrides, compression, edits, masks, batch generation, and transparent-background workflows.
- Save full-resolution generated images under the active runtime's temporary directory so local, SSH, Docker, and other runtime tools can access them consistently.
- Treat runtime-temp generated artifacts as best-effort session artifacts. Durable or project-bound images must be copied into the workspace explicitly.
- Persist bounded thumbnails in successful tool results for transcript preview. Do not persist full-resolution image bytes in chat history.
- Generate thumbnails in a deep, isolated artifact/thumbnail module with a simple interface that accepts image bytes and returns saved artifact metadata plus an optional bounded thumbnail.
- If thumbnail generation fails, return successful generation output with saved paths and a warning/omitted thumbnail rather than failing the whole tool call.
- Add a Generated Image Display Message as a first-class frontend display type derived from successful
image_generate tool results.
- Keep normal persisted tool call/result parts as the transcript source of truth. Do not add a new persisted chat part or stream protocol event in v1.
- Replace successful
image_generate tool rows with Generated Image Display Messages in the displayed transcript.
- Continue rendering pending, executing, failed, interrupted, or redacted
image_generate calls as normal tool rows.
- Use existing tool-side model usage reporting for image-generation usage when the provider returns usage metadata.
- Wrap usage reporting failures so they never fail otherwise successful image generation.
- Enforce Mux provider/model policy against the configured Image Generation Model before making provider calls.
- Keep the tool registered when the experiment is enabled, even if credentials or setup are invalid; return structured tool failures at execution time.
- Make built-in Plan and Explore agents remove
image_generate from their tool policies. Exec-mode agents may use it by default when the experiment is enabled.
- Add a richer single-file built-in
/imagegen Agent Skill with prompting principles, use-case recipes, iteration guidance, scope boundaries, and artifact policy.
- Do not bundle fallback CLI scripts, direct Image API reference files, chroma-key helpers, or executable workflows for deferred features in v1.
- Add lightweight user documentation for the experimental Image Generation Model setting, default model, pinned snapshot option, OpenAI credential requirement, runtime artifact behavior, and generate-only scope.
- Use the existing ADR to preserve architectural rationale: Mux-executed tool, direct Images API via AI SDK, runtime-temp artifacts, derived display messages, bounded thumbnails, experiment gating, Exec-only default availability, and policy enforcement.
Deep module opportunities:
- Image generation configuration resolver: returns normalized model and count-limit settings with defaults and validation.
- Image generation provider adapter: creates/validates the OpenAI image model and calls AI SDK generation behind a small interface.
- Image artifact writer: saves generated bytes to the active runtime temp directory and returns safe artifact metadata.
- Thumbnail generator: creates bounded thumbnail previews independently from tool execution.
- Image tool result parser: validates successful tool results and maps them into Generated Image Display Message data.
- Image setup/policy validator: checks provider scope, credentials, provider enabled state, and Mux policy before provider calls.
Testing Decisions
Good tests for this feature should assert externally visible behavior: tool input validation, settings persistence, generated artifact metadata, structured failure outputs, policy enforcement, usage-report calls, display-message derivation, and rendered transcript behavior. Tests should avoid tautological assertions against prompt prose or copied skill text unless the behavior depends on a specific scope rule.
Modules to test:
- Image generation configuration resolver: default model, default max count, range validation, malformed values, and persistence compatibility.
- Experiment settings UI: toggle visibility, model setting, max-count setting, save/load behavior, and validation feedback.
- Tool schema and handler: prompt validation, count-limit rejection, provider/model validation, quality/output-format pass-through, structured setup failures, successful result shape, artifact saving, thumbnail fallback, and usage reporting.
- Provider/policy validation: OpenAI-only v1 behavior, policy-denied provider, policy-denied model, disabled provider, missing credentials, and invalid model strings.
- Artifact writer and thumbnail generator: safe paths, runtime temp layout, correct media types/extensions, thumbnail bounds, and graceful thumbnail failures.
- Tool assembly and agent policy: experiment-gated registration, Exec availability, Plan/Explore removal, and policy filtering behavior.
- Message aggregation: successful image tool output becomes a Generated Image Display Message; non-success states remain tool rows; generated rows preserve ordering, history identity, stream sequence, timestamps, and interruption behavior.
- Message rendering: generated image cards display thumbnails, saved paths, prompt/model metadata, and multiple images without exposing full base64 in raw details.
- Built-in skill packaging:
/imagegen is discoverable as a built-in skill and contains the intended operating guidance without bundling scripts.
- Documentation generation/checks: user docs render cleanly and built-in skill generation remains up to date.
Prior art in the codebase:
- Existing tool tests cover normal tool handler behavior and structured failures.
- Advisor Tool tests demonstrate reporting model usage from a model-using tool without failing the tool on usage-report problems.
- Attachment and desktop screenshot tests demonstrate media-style tool results and image preview behavior.
- Streaming message aggregation tests cover deriving display rows from persisted message/tool parts.
- Message renderer tests cover routing display message types to specialized components.
- Existing experiment/settings tests show inline experiment configuration patterns.
- Config schema tests cover app config persistence and validation.
- Built-in skill tests and generation scripts cover embedded skill packaging.
Dogfooding and quality gates:
- Run targeted unit tests for config, tool handler, artifact/thumbnail helpers, tool assembly, aggregation, rendering, and built-in skill packaging.
- Run typecheck and lint/static checks for touched code.
- In a local runtime workspace, enable the Image Generation Tool experiment, set model
openai:gpt-image-2, set max images to 2, and ask Exec via /imagegen to generate two simple abstract test images.
- Capture screenshots showing the Settings configuration and the Generated Image Display Message with thumbnails.
- Record a short video of enabling the experiment, generating images, and inspecting the generated-image transcript row.
- Verify full-resolution artifacts exist under the runtime temp directory.
- Ask for more images than the configured max and verify the structured failure appears.
- Copy one selected generated image into the workspace and verify it appears as an intentional git change.
Out of Scope
- Image editing.
- Masks or reference-image editing workflows.
- Batch generation or JSONL job processing.
- Transparent-background workflows, chroma-key removal, and native transparency fallback logic.
- Bundled fallback CLI scripts.
- Provider adapters beyond OpenAI.
- Hosted OpenAI Responses image-generation tool integration.
- New persisted generated-image chat parts or new stream protocol events.
- Artifact indexing or explicit cleanup/TTL management.
- Per-call user confirmation UX.
- Automatic copying of generated images into project workspaces.
- Full image bytes stored in chat history.
- Advanced provider knobs such as seed, aspect ratio, style, background, moderation overrides, and output compression.
- Guaranteed dollar-cost pricing for image generation when provider usage metadata is incomplete.
Further Notes
The domain glossary and ADR should remain the source of truth for terminology: Image Generation Tool, Image Generation Model, Image Generation Configuration, Image Generation Artifact, Image Generation Preview, Generated Image Display Message, and Image Generation Policy Enforcement.
The implementation should preserve startup resilience and avoid making image-generation setup failures crash or block unrelated sends. The tool should fail fast and clearly when assumptions are invalid.
Because this is visual product work, dogfooding evidence is part of the acceptance criteria. Automated tests should prove behavior and wiring; screenshots and a short recording should prove the end-to-end visual experience.
Generated with mux • Model: openai:gpt-5.5 • Thinking: high • Cost: $97.31
Problem Statement
Mux users can ask agents to design or generate visual assets, but Mux does not currently provide a first-class Image Generation Tool. Agents can discuss image prompts or improvise external workflows, but there is no integrated way to use a settings-configured Image Generation Model, save generated artifacts in the active runtime, preview them in the transcript, account for provider usage, or guide agents through safe artifact handling.
The user wants image generation to be explicitly experimental, configurable in Settings, and robust enough that generated images are visible as first-class transcript artifacts rather than generic tool JSON. The feature should avoid polluting a project workspace with every generated preview while still making it easy for agents to copy selected final assets into the workspace.
Solution
Add an experimental, Mux-executed
image_generatetool backed by OpenAI image models through the AI SDK image generation API. The experiment is default-off and configurable from Settings. It uses a nested Image Generation Configuration with a default Image Generation Model ofopenai:gpt-image-2and a configurable maximum image count per call.When an Exec-mode agent successfully calls
image_generate, Mux saves full-resolution Image Generation Artifacts under the active runtime's temporary directory, stores bounded thumbnails in the tool result, and renders a first-class Generated Image Display Message in the transcript. The normal persisted transcript source of truth remains the tool call/result, but successful image generations are displayed as generated-image messages instead of generic tool rows.Ship a richer built-in
/imagegenAgent Skill that teaches when and how to use the tool, prompt construction patterns, iteration guidance, and artifact policy. Add concise user documentation for the experimental Image Generation Model setting. Preserve policy enforcement, usage reporting, and structured failure behavior so image generation is not a provider-policy or cost-visibility escape hatch.User Stories
openai:gpt-image-2to be the default Image Generation Model, so that the experiment starts with OpenAI's current GPT Image 2 model.openai:gpt-image-2-2026-04-21if needed, so that I can get stable image-model behavior over time./imagegenskill, so that I know when and how to use the Image Generation Tool.Implementation Decisions
image_generate.provider:model-idshape for Image Generation Model configuration.openai:gpt-image-2and support the pinned snapshotopenai:gpt-image-2-2026-04-21as a user-entered override.image_generatetool results.image_generatetool rows with Generated Image Display Messages in the displayed transcript.image_generatecalls as normal tool rows.image_generatefrom their tool policies. Exec-mode agents may use it by default when the experiment is enabled./imagegenAgent Skill with prompting principles, use-case recipes, iteration guidance, scope boundaries, and artifact policy.Deep module opportunities:
Testing Decisions
Good tests for this feature should assert externally visible behavior: tool input validation, settings persistence, generated artifact metadata, structured failure outputs, policy enforcement, usage-report calls, display-message derivation, and rendered transcript behavior. Tests should avoid tautological assertions against prompt prose or copied skill text unless the behavior depends on a specific scope rule.
Modules to test:
/imagegenis discoverable as a built-in skill and contains the intended operating guidance without bundling scripts.Prior art in the codebase:
Dogfooding and quality gates:
openai:gpt-image-2, set max images to 2, and ask Exec via/imagegento generate two simple abstract test images.Out of Scope
Further Notes
The domain glossary and ADR should remain the source of truth for terminology: Image Generation Tool, Image Generation Model, Image Generation Configuration, Image Generation Artifact, Image Generation Preview, Generated Image Display Message, and Image Generation Policy Enforcement.
The implementation should preserve startup resilience and avoid making image-generation setup failures crash or block unrelated sends. The tool should fail fast and clearly when assumptions are invalid.
Because this is visual product work, dogfooding evidence is part of the acceptance criteria. Automated tests should prove behavior and wiring; screenshots and a short recording should prove the end-to-end visual experience.
Generated with
mux• Model:openai:gpt-5.5• Thinking:high• Cost:$97.31