feat(dotAI): Dot AI LangChain4J integration (WIP - First provider) by ihoffmann-dot · Pull Request #35150 · dotCMS/core

ihoffmann-dot · 2026-03-27T19:47:56Z

Summary

Replaces the direct OpenAI HTTP client (OpenAIClient) with a LangChain4J abstraction layer, enabling multi-provider AI support in dotCMS. This PR covers Phase 1: OpenAI via LangChain4J.

Changes

LangChain4jAIClient.java: New AIProxiedClient implementation that delegates chat, embeddings, and image requests to LangChain4J models.
LangChain4jModelFactory.java: Factory that builds ChatModel, EmbeddingModel, and ImageModel instances from a ProviderConfig. Only place with provider-specific builder logic.
ProviderConfig.java: Deserializable POJO for the providerConfig JSON secret (per provider section: model, apiKey, endpoint, maxTokens, maxCompletionTokens, etc.).
AppConfig.java: Replaced legacy individual-field secrets (apiKey, model, etc.) with a single providerConfig JSON string. isEnabled() now only checks this field.
AIAppValidator.java: Removed the OpenAI /v1/models validation call, which is incompatible with the multi-provider architecture.
CompletionsResource.java: Updated /api/v1/ai/completions/config to derive model names and config values from AppConfig getters instead of iterating raw AppKeys.
dotAI.yml: Removed legacy hidden fields; added providerConfig as the single configuration entry point.
Tests: Added unit tests for ProviderConfig, LangChain4jModelFactory, and LangChain4jAIClient; updated AIProxyClientTest integration test to use providerConfig-based setup.

Motivation

The previous implementation was tightly coupled to OpenAI's API contract (hardcoded HTTP calls, OpenAI-specific parameters, model validation via /v1/models). LangChain4J provides a provider-agnostic model interface, allowing future phases to add Azure OpenAI, AWS Bedrock, and Vertex AI without touching the core request/response flow.

The providerConfig JSON secret replaces multiple individual secrets with a single structured configuration, supporting per-section (chat/embeddings/image) provider and model settings.

Related Issue

EPIC: dotAI Multi-Provider Support #33970

…(WIP)

…ration

…d setup

github-actions · 2026-03-27T19:48:06Z

❌ Issue Linking Required

This PR could not be linked to an issue. All PRs must be linked to an issue for tracking purposes.

How to fix this:

Option 1: Add keyword to PR body (Recommended - auto-removes this comment)
Edit this PR description and add one of these lines:

This PR fixes #123 or Fixes: #123
This PR closes #123 or Closes: #123
This PR resolves #123 or Resolves: #123
Other supported keywords: fix, fixed, close, closed, resolve, resolved
Option 2: Link via GitHub UI (Note: won't clear the failed check)

Go to the PR → Development section (right sidebar)
Click "Link issue" and select an existing issue
Push a new commit or re-run the workflow to clear the failed check
Option 3: Use branch naming
Create a new branch with one of these patterns:

123-feature-description (number at start)
issue-123-feature-description (issue-number at start)
feature-issue-123 (issue-number anywhere)

Why is this required?

Issue linking ensures proper tracking, documentation, and helps maintain project history. It connects your code changes to the problem they solve.---

This comment was automatically generated by the issue linking workflow

claude · 2026-03-27T19:48:44Z

Claude finished @ihoffmann-dot's task in 5m 5s —— View job

Code Review

Read key changed files
Review LangChain4jAIClient, LangChain4jModelFactory, ProviderConfig
Review AppConfig, AppKeys, dotAI.yml changes
Review CompletionsResource, AIAppValidator, AIModels changes
Review tests
Post findings

Findings

1. Security: API keys exposed in UI — dotAI.yml

providerConfig is declared hidden: false, so the entire JSON blob (including apiKey, and future secretAccessKey/accessKeyId for Bedrock) is displayed in plain text in the dotCMS apps UI. The old apiKey field was hidden: true. This is a regression; the field should be hidden: true.

2. Security: Incomplete credential redaction in /config endpoint — CompletionsResource.java:175

providerConfig.replaceAll("\"apiKey\"\\s*:\\s*\"[^\"]+\"", "\"apiKey\":\"*****\"")

This only redacts the apiKey field. For Bedrock configs (secretAccessKey, accessKeyId) or Vertex AI, those credentials will be returned in plaintext from the /api/v1/ai/completions/config GET endpoint. The redaction must cover all credential fields, not just apiKey. Fix this →

3. NPE crash: computeIfAbsent with null-returning factory — LangChain4jAIClient.java:118-152

LangChain4jModelFactory.build* returns null when config.getProvider() == null. ConcurrentHashMap.computeIfAbsent explicitly forbids a null return from the mapping function and will throw NullPointerException. A request where the provider field is omitted from the JSON will crash here rather than giving a useful error.

// In LangChain4jModelFactory.buildChatModel():
if (config == null || config.getProvider() == null) {
    return null;   // ← ConcurrentHashMap will NPE on this
}

The factory should throw IllegalArgumentException instead of returning null, consistent with the default branch in the switch.

4. Unbounded memory leak in model caches — LangChain4jAIClient.java:59-61

The three ConcurrentHashMap caches are keyed on the full providerConfigJson string (including the plaintext API key). There is no size limit, TTL, or eviction. In a multi-site installation where each site has its own providerConfig, or whenever providerConfig is updated, stale model instances accumulate permanently. Consider using a size-bounded cache (e.g. Caffeine, or a simple LRU wrapper).

5. getProvider() hardcoded to OPEN_AI — LangChain4jAIClient.java:71

public AIProvider getProvider() {
    return AIProvider.OPEN_AI;
}

The class is intended to be the single multi-provider client. Returning OPEN_AI for Bedrock or Vertex requests will mislead any code that branches on provider identity. The method should either be removed from the interface contract, or resolved dynamically from ProviderConfig.

6. isEnabled() Javadoc says dual-check, implementation is single-check — AppConfig.java:311

/**
 * Returns true when either a {@code providerConfig} JSON is present (new path)
 * or all legacy URL + API key fields are populated (old path).   ← Javadoc
 */
public boolean isEnabled() {
    return StringUtils.isNotBlank(providerConfig);  // ← only new path
}

The Javadoc describes the safe dual-check that would protect rollback, but the implementation only checks providerConfig. The previous review comment already flagged the rollback risk; the Javadoc just makes the gap more visible. Either fix the implementation to match the docs, or fix the docs to be honest that legacy keys are no longer checked.

7. validateAIConfig is a complete no-op — AIAppValidator.java:49-51

public void validateAIConfig(final AppConfig appConfig, final String userId) {
    appConfig.debugLogger(getClass(), () -> "AI configuration validation delegated to LangChain4J");
}

LangChain4J validates credentials at request time, not eagerly. This means a misconfigured providerConfig (wrong API key, bad model name) is invisible until the first real request fails at runtime. The old /v1/models call was removed for good reason, but there's now zero upfront feedback on a broken config. At a minimum, AIAppValidator should attempt to parse and validate the ProviderConfig structure without making a network call.

8. providerConfig JSON parsed redundantly on AppConfig construction — AppConfig.java:62-70

The JSON is parsed three separate times (once for "chat", "image", and "embeddings") via buildModelFromProviderConfig. One parse + three root.get(section) calls would be equivalent.

9. Missing section in providerConfig throws unchecked — LangChain4jAIClient.java:241

If a site only configures chat but not embeddings, any embeddings request will throw IllegalArgumentException("Missing 'embeddings' section in providerConfig") which is caught somewhere up the stack as a 500. There should be a clear, user-facing error or a graceful "not configured" response for unconfigured sections, rather than an unchecked exception with an internal message leaking config structure.

Minor

LangChain4jModelFactory uses switch with a default that embeds "Supported in Phase 1: openai" in the error message. That string will be wrong in Phase 2 without touching call sites — consider removing it or using a computed list.
toMessages() silently drops the content of messages that are empty strings; LLMs may behave unexpectedly with empty message content.
AppKeys still contains TEXT_MODEL_NAMES, TEXT_MODEL_TOKENS_PER_MINUTE, etc. which are no longer used anywhere (the old model-fetch flow was removed). These are dead entries that will confuse future readers.

claude · 2026-03-27T19:52:01Z

Pull Request Unsafe to Rollback!!!

Category: M-3 — REST / Headless API Contract Change
Risk Level: 🟡 MEDIUM
Why it's unsafe: The /api/v1/ai/completions/config GET endpoint response shape has changed: the new code explicitly builds a narrow response map (replacing the previous full AppKeys iteration), so fields such as textModelNames, textModelTokensPerMinute, textModelApiPerMinute, imageModelNames, embeddingsModelNames, and related model-config keys are no longer returned. Any cached Angular frontend page or integration client that was built against N's response shape will not find those fields on a rollback to N-1.

Additionally, AppConfig.isEnabled() was changed to check only providerConfig (the new secret). If a user deploys N and configures only providerConfig (leaving the legacy apiKey empty), rolling back to N-1 causes isEnabled() to return false and the entire dotAI integration is silently disabled until the legacy apiKey field is manually re-populated.
Code that makes it unsafe:
- dotCMS/src/main/java/com/dotcms/ai/rest/CompletionsResource.java — getConfig() now builds the response from explicit getters rather than iterating all AppKeys; many previously-returned keys are absent from the new response map.
- dotCMS/src/main/java/com/dotcms/ai/app/AppConfig.java — isEnabled() now returns StringUtils.isNotBlank(providerConfig) only (line ~220), dropping the legacy apiKey + URL check that N-1 uses to decide whether the AI app is active.
- dotCMS/src/main/resources/apps/dotAI.yml — the apiKey param (previously required: true, hidden: true) was removed from the schema; providerConfig added as the sole activation field.
Alternative (if possible): Keep AppConfig.isEnabled() dual-checking: return true when either providerConfig or the legacy apiKey is set, so N-1 continues to work for sites that have not yet migrated. For the /config endpoint, preserve the previously-returned model-config keys (or alias them) alongside the new providerConfig field, so clients built against N-1's response contract do not receive unexpected nulls after rollback. Follow the two-phase removal pattern: ship N with both old and new config paths active, remove the legacy path in N+1 once N-2 is outside the rollback window.

…ource

…n URLs

…gs model

ihoffmann-dot added 9 commits March 24, 2026 18:51

feat(dotAI): replace OpenAIClient with LangChain4J abstraction layer …

26c8c61

…(WIP)

feat(dotAI): remove legacy config support, require providerConfig

48f268b

test(dotAI): add unit and integration tests for LangChain4J client layer

0e56d8c

fix(dotAI): change BOOL+hidden params to STRING in dotAI.yml

5179db3

fix(dotAI): remove legacy hidden params from dotAI.yml

86057b1

fix(dotAI): update dotAI.yml description to reflect LangChain4J integ…

3a5387c

…ration

fix(dotAI): remove legacy OpenAI model validation from AIAppValidator

ebbeaf1

fix(dotAI): update /completions/config to reflect providerConfig-base…

3de8a67

…d setup

fix(dotAI): support maxCompletionTokens for o-series OpenAI models

cfdd3cf

github-project-automation bot added this to dotCMS - Product Planning Mar 27, 2026

github-actions bot added the Area : Backend PR changes Java/Maven backend code label Mar 27, 2026

ihoffmann-dot self-assigned this Mar 27, 2026

claude bot added the AI: Not Safe To Rollback label Mar 27, 2026

ihoffmann-dot added 5 commits March 30, 2026 16:34

fix(dotAI): replace legacy getApiKey guard with isEnabled in ImageRes…

bfa54d1

…ource

refactor(dotAI): remove dead OpenAI model-fetch flow from AIModels

6a1213d

fix(dotAI): handle base64 image responses for models that don't retur…

80820a6

…n URLs

fix(dotAI): send text content (not token IDs) to LangChain4J embeddin…

07f7bc1

…gs model

fix(dotAI): skip token encoding guard when model not in jtokkit registry

d79b37c

ihoffmann-dot marked this pull request as ready for review March 30, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dotAI): Dot AI LangChain4J integration (WIP - First provider)#35150

feat(dotAI): Dot AI LangChain4J integration (WIP - First provider)#35150
ihoffmann-dot wants to merge 14 commits intomainfrom
dot-ai-langchain-integration

ihoffmann-dot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ihoffmann-dot commented Mar 27, 2026

Summary

Changes

Motivation

Related Issue

Uh oh!

github-actions bot commented Mar 27, 2026

❌ Issue Linking Required

How to fix this:

Why is this required?

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Findings

Minor

Uh oh!

claude bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 27, 2026 •

edited

Loading