Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/features/openai-image-generation-settings/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# OpenAI Image Generation Settings Plan

## Architecture

- Keep the shared `ImageGenerationOptions` type and contracts.
- Rename gpt-image-2-specific helpers, constants, validators, and UI component names to OpenAI image generation settings names.
- Store session image settings as JSON in `deepchat_sessions`; keep the existing v27 migration.
- Keep model-level settings in the existing model config JSON store.

## Data Flow

- Settings dialog writes model-level `imageGeneration` when `supportsOpenAIImageGenerationSettings(...)` is true.
- Chat status bar writes session-level `imageGeneration` under the same capability check.
- Agent runtime merges effective session settings into `ModelConfig`.
- AI SDK runtime passes `size` at the `generateImage()` top level and OpenAI provider options through `providerOptions`.

## Compatibility

- Existing sessions without image settings behave exactly as before.
- Empty or invalid stored image settings are treated as unset.
- Existing chat-model settings remain unchanged for models outside the OpenAI image settings capability.
- The `deepchat_sessions` migration stays at version 27 because the global schema version is already 26.

## Tests

- Runtime tests use `gpt-image-2` for empty options and option forwarding.
- Contract and SQLite tests verify config round trips.
- Renderer component tests use `gpt-image-2` as the positive image settings fixture and existing chat model ids as generic fallbacks.
27 changes: 27 additions & 0 deletions docs/features/openai-image-generation-settings/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# OpenAI Image Generation Settings Spec

## User Story

Users who select an OpenAI or OpenAI-compatible image generation model can configure image generation parameters without seeing chat-only settings that do not affect image generation.

## Acceptance Criteria

- OpenAI image-generation routes, image endpoints, imageGeneration model types, and the current `gpt-image-2` fallback use the image-specific settings UI.
- Default UI choices do not persist or send image generation parameters.
- Model-level image settings define defaults for new sessions.
- Session-level image settings can override model-level settings.
- Runtime forwards only valid OpenAI image options to AI SDK image generation.
- Invalid custom sizes cannot be saved from the UI.

## Non-goals

- Do not add support for `n`, `partial_images`, streaming partial images, `input_fidelity`, `style`, or `user`.
- Do not add transparent background support.
- Do not test future or unconfirmed model ids.

## Constraints

- Public config fields remain under `imageGeneration`.
- Unset options must remain `undefined` so OpenAI defaults apply.
- Supported stored fields are `size`, `quality`, `outputFormat`, `outputCompression`, `background`, and `moderation`.
- Custom sizes must use `{width}x{height}`, both dimensions must be multiples of 16, each side must be at most 3840, aspect ratio must be at most 3:1, and total pixels must be between 655360 and 8294400.
10 changes: 10 additions & 0 deletions docs/features/openai-image-generation-settings/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# OpenAI Image Generation Settings Tasks

- [x] Define SDD artifacts.
- [x] Add shared image option types, validation, and contracts.
- [x] Persist session image settings.
- [x] Forward image settings in AI SDK runtime.
- [x] Add image-generation-specific settings UI.
- [x] Generalize naming from gpt-image-2 to OpenAI image generation settings.
- [x] Keep tests focused on `gpt-image-2` and existing non-image models.
- [x] Run format, i18n, lint, typecheck, and focused tests.
60 changes: 59 additions & 1 deletion src/main/presenter/agentRuntimePresenter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ import {
MODEL_TIMEOUT_MAX_MS,
MODEL_TIMEOUT_MIN_MS
} from '@shared/modelConfigDefaults'
import {
normalizeImageGenerationOptions,
supportsOpenAIImageGenerationSettings
} from '@shared/imageGenerationSettings'
import { isDeepSeekSeriesModelId } from '@shared/model'
import { nanoid } from 'nanoid'
import type { SQLitePresenter } from '../sqlitePresenter'
Expand Down Expand Up @@ -1755,6 +1759,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
reasoningEffort: generationSettings.reasoningEffort,
reasoningVisibility: generationSettings.reasoningVisibility,
verbosity: generationSettings.verbosity,
imageGeneration: generationSettings.imageGeneration,
reasoning: getReasoningEffectiveEnabledForProvider(capabilityProviderId, reasoningPortrait, {
reasoning: baseModelConfig.reasoning,
reasoningEffort: generationSettings.reasoningEffort ?? baseModelConfig.reasoningEffort
Expand Down Expand Up @@ -3029,6 +3034,9 @@ export class AgentRuntimePresenter implements IAgentImplementation {
if (Object.prototype.hasOwnProperty.call(requestedPatch, 'forceInterleavedThinkingCompat')) {
patch.forceInterleavedThinkingCompat = sanitized.forceInterleavedThinkingCompat
}
if (Object.prototype.hasOwnProperty.call(requestedPatch, 'imageGeneration')) {
patch.imageGeneration = sanitized.imageGeneration
}
Comment on lines +3037 to +3039
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

imageGeneration persistence is write-only right now.

Line 3037 through Line 3039 writes the field, but the session load mapper (mapPersistedGenerationPatch) does not restore persisted image options. Session overrides can be lost after process restart.

Suggested fix (read-path symmetry)
 type PersistedSessionGenerationRow = {
   provider_id: string
   model_id: string
   permission_mode: PermissionMode
   system_prompt: string | null
   temperature: number | null
   context_length: number | null
   max_tokens: number | null
   timeout_ms: number | null
   thinking_budget: number | null
   reasoning_effort: SessionGenerationSettings['reasoningEffort'] | null
   reasoning_visibility: SessionGenerationSettings['reasoningVisibility'] | null
   verbosity: SessionGenerationSettings['verbosity'] | null
   force_interleaved_thinking_compat: number | null
+  image_generation_options_json: string | null
 }

 // inside mapPersistedGenerationPatch(...)
 if (typeof sessionRow.force_interleaved_thinking_compat === 'number') {
   patch.forceInterleavedThinkingCompat = sessionRow.force_interleaved_thinking_compat === 1
 }

+if (typeof sessionRow.image_generation_options_json === 'string') {
+  try {
+    const parsed = JSON.parse(sessionRow.image_generation_options_json) as unknown
+    const imageGeneration = normalizeImageGenerationOptions(parsed)
+    if (imageGeneration) {
+      patch.imageGeneration = imageGeneration
+    }
+  } catch {
+    // ignore invalid persisted payload
+  }
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/presenter/agentRuntimePresenter/index.ts` around lines 3037 - 3039,
The write path sets patch.imageGeneration when requestedPatch has
imageGeneration, but the read path (mapPersistedGenerationPatch) never restores
it, causing session image options to be lost after restart; update
mapPersistedGenerationPatch to mirror the write-path logic by reading the
persisted patch's imageGeneration and mapping it into the returned session
override (i.e., ensure the mapper copies persisted.imageGeneration into the
session's imageGeneration field), and add any necessary null/undefined checks
consistent with how requestedPatch, patch, and sanitized are handled.


return patch
}
Expand All @@ -3046,10 +3054,15 @@ export class AgentRuntimePresenter implements IAgentImplementation {
reasoningEffort: settings.reasoningEffort,
reasoningVisibility: settings.reasoningVisibility,
verbosity: settings.verbosity,
forceInterleavedThinkingCompat: settings.forceInterleavedThinkingCompat
forceInterleavedThinkingCompat: settings.forceInterleavedThinkingCompat,
imageGeneration: settings.imageGeneration
}
}

private resolveProviderApiType(providerId: string): string | undefined {
return this.configPresenter.getProviderById?.(providerId)?.apiType
}

private async buildDefaultGenerationSettings(
providerId: string,
modelId: string
Expand Down Expand Up @@ -3109,6 +3122,22 @@ export class AgentRuntimePresenter implements IAgentImplementation {
defaults.forceInterleavedThinkingCompat = interleavedThinkingDefault
}

if (
supportsOpenAIImageGenerationSettings({
providerId,
providerApiType: this.resolveProviderApiType(providerId),
modelId,
apiEndpoint: modelConfig.apiEndpoint,
endpointType: modelConfig.endpointType,
type: modelConfig.type
})
) {
const imageGeneration = normalizeImageGenerationOptions(modelConfig.imageGeneration)
if (imageGeneration) {
defaults.imageGeneration = imageGeneration
}
}

const supportsReasoning =
this.configPresenter.supportsReasoningCapability?.(providerId, modelId) === true
if (supportsReasoning) {
Expand Down Expand Up @@ -3324,6 +3353,35 @@ export class AgentRuntimePresenter implements IAgentImplementation {
delete next.forceInterleavedThinkingCompat
}

if (
supportsOpenAIImageGenerationSettings({
providerId,
providerApiType: this.resolveProviderApiType(providerId),
modelId,
apiEndpoint: modelConfig.apiEndpoint,
endpointType: modelConfig.endpointType,
type: modelConfig.type
})
) {
if (Object.prototype.hasOwnProperty.call(patch, 'imageGeneration')) {
const imageGeneration = normalizeImageGenerationOptions(patch.imageGeneration)
if (imageGeneration) {
next.imageGeneration = imageGeneration
} else {
delete next.imageGeneration
}
} else {
const imageGeneration = normalizeImageGenerationOptions(next.imageGeneration)
if (imageGeneration) {
next.imageGeneration = imageGeneration
} else {
delete next.imageGeneration
}
}
} else {
delete next.imageGeneration
}

if (fixedTemperatureKimi) {
next.temperature = fixedTemperatureKimi.temperature
}
Expand Down
95 changes: 94 additions & 1 deletion src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { embedMany, generateId, generateImage, generateText, streamText } from 'ai'
import type { JSONValue } from 'ai'
import type {
ChatMessage,
IConfigPresenter,
Expand All @@ -13,6 +14,11 @@ import {
applyMoonshotKimiReasoningTemperaturePolicy,
resolveMoonshotKimiTemperaturePolicy
} from '@shared/moonshotKimiPolicy'
import {
normalizeImageGenerationOptions,
supportsOpenAIImageGenerationSettings,
type ImageGenerationOptions
} from '@shared/imageGenerationSettings'
import { presenter } from '@/presenter'
import { EMBEDDING_TEST_KEY, isNormalized } from '@/utils/vector'
import type { LLMCoreStreamEvent } from '@shared/types/core/llm-events'
Expand All @@ -22,6 +28,12 @@ import { buildProviderOptions } from './providerOptionsMapper'
import { type AiSdkProviderKind, createAiSdkProviderContext } from './providerFactory'
import { adaptAiSdkStream } from './streamAdapter'

type ImageGenerationProviderPayload = Record<string, JSONValue>
type ImageGenerationRequestOptions = {
size?: `${number}x${number}`
providerOptions?: Record<string, ImageGenerationProviderPayload>
}

export interface AiSdkRuntimeContext {
providerKind: AiSdkProviderKind
provider: LLM_PROVIDER
Expand Down Expand Up @@ -154,6 +166,78 @@ function resolveRequestTimeout(modelConfig: ModelConfig): number | undefined {
return Math.round(timeout)
}

function buildImageGenerationProviderPayload(
providerOptionsKey: string,
options: ImageGenerationOptions
): ImageGenerationProviderPayload {
const officialOpenAI = providerOptionsKey === 'openai'
const payload: ImageGenerationProviderPayload = {}

if (options.quality) {
payload.quality = options.quality
}
if (options.background) {
payload.background = options.background
}
if (options.moderation) {
payload.moderation = options.moderation
}
if (options.outputFormat) {
payload[officialOpenAI ? 'outputFormat' : 'output_format'] = options.outputFormat
}
if (options.outputCompression !== undefined) {
payload[officialOpenAI ? 'outputCompression' : 'output_compression'] = options.outputCompression
}

return payload
}

function buildImageGenerationRequestOptions(
context: AiSdkRuntimeContext,
providerOptionsKey: string,
modelId: string,
modelConfig: ModelConfig
): ImageGenerationRequestOptions {
if (
!supportsOpenAIImageGenerationSettings({
providerId: context.provider.id,
providerApiType: context.provider.apiType,
providerKind: context.providerKind,
providerOptionsKey,
modelId,
apiEndpoint: modelConfig.apiEndpoint,
endpointType: modelConfig.endpointType,
type: modelConfig.type
})
) {
return {}
}

const imageGeneration = normalizeImageGenerationOptions(modelConfig.imageGeneration)
if (!imageGeneration) {
return {}
}

const { size, ...providerImageOptions } = imageGeneration
const providerPayload = buildImageGenerationProviderPayload(
providerOptionsKey,
providerImageOptions
)
const requestOptions: ImageGenerationRequestOptions = {}

if (size) {
requestOptions.size = size as `${number}x${number}`
}

if (Object.keys(providerPayload).length > 0) {
requestOptions.providerOptions = {
[providerOptionsKey]: providerPayload
}
}

return requestOptions
}

function normalizeRuntimeModelConfig(
context: AiSdkRuntimeContext,
modelId: string,
Expand Down Expand Up @@ -329,18 +413,27 @@ export async function* runAiSdkCoreStream(
throw new Error(`Image generation is not supported by provider ${context.provider.id}`)
}

const imageGenerationRequestOptions = buildImageGenerationRequestOptions(
context,
providerContext.providerOptionsKey,
modelId,
normalizedModelConfig
)

await context.emitRequestTrace?.(modelConfig, {
endpoint: providerContext.imageEndpoint ?? providerContext.endpoint,
headers: context.buildTraceHeaders?.() ?? context.defaultHeaders,
body: {
model: providerContext.resolvedModelId ?? modelId,
prompt
prompt,
...imageGenerationRequestOptions
}
})

const result = await generateImage({
model: providerContext.imageModel,
prompt,
...imageGenerationRequestOptions,
...(timeout ? { abortSignal: AbortSignal.timeout(timeout) } : {})
})

Expand Down
17 changes: 12 additions & 5 deletions src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ const isOpenAIImageGenerationModel = (modelId: string): boolean =>
OPENAI_IMAGE_GENERATION_MODELS.includes(modelId) ||
OPENAI_IMAGE_GENERATION_MODEL_PREFIXES.some((prefix) => modelId.startsWith(prefix))

const shouldUseOpenAIImageGenerationRoute = (modelId: string, modelConfig: ModelConfig): boolean =>
isOpenAIImageGenerationModel(modelId) ||
modelConfig.apiEndpoint === ApiEndpointType.Image ||
modelConfig.type === ModelType.ImageGeneration

export function normalizeExtractedImageText(content: string): string {
const normalized = content
.replace(/\r\n/g, '\n')
Expand Down Expand Up @@ -537,11 +542,13 @@ export class AiSdkProvider extends BaseLLMProvider {
runtimeModelConfig.apiEndpoint === ApiEndpointType.Image
: decision.providerKind === 'openai-responses'
? (runtimeModelId: string, runtimeModelConfig: ModelConfig) =>
isOpenAIImageGenerationModel(runtimeModelId) ||
runtimeModelConfig.apiEndpoint === ApiEndpointType.Image
: (runtimeModelId: string, runtimeModelConfig: ModelConfig) =>
isOpenAIImageGenerationModel(runtimeModelId) ||
runtimeModelConfig.apiEndpoint === ApiEndpointType.Image
shouldUseOpenAIImageGenerationRoute(runtimeModelId, runtimeModelConfig)
: decision.providerKind === 'openai-compatible'
? (runtimeModelId: string, runtimeModelConfig: ModelConfig) =>
shouldUseOpenAIImageGenerationRoute(runtimeModelId, runtimeModelConfig)
: (runtimeModelId: string, runtimeModelConfig: ModelConfig) =>
isOpenAIImageGenerationModel(runtimeModelId) ||
runtimeModelConfig.apiEndpoint === ApiEndpointType.Image

return {
decision,
Expand Down
4 changes: 3 additions & 1 deletion src/main/presenter/sqlitePresenter/schemaCatalog.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@ const CATALOG_DEFINITIONS: CatalogDefinition[] = [
timeout_ms: 'ALTER TABLE deepchat_sessions ADD COLUMN timeout_ms INTEGER;',
force_interleaved_thinking_compat:
'ALTER TABLE deepchat_sessions ADD COLUMN force_interleaved_thinking_compat INTEGER;',
reasoning_visibility: 'ALTER TABLE deepchat_sessions ADD COLUMN reasoning_visibility TEXT;'
reasoning_visibility: 'ALTER TABLE deepchat_sessions ADD COLUMN reasoning_visibility TEXT;',
image_generation_options_json:
'ALTER TABLE deepchat_sessions ADD COLUMN image_generation_options_json TEXT;'
},
typeCheckedColumns: [
'summary_cursor_order_seq',
Expand Down
Loading