Skip to content

DO NOT MERGE!!!!#34938

Draft
fabrizzio-dotCMS wants to merge 44 commits intomainfrom
issue-34933-ESMappingAPIImpl
Draft

DO NOT MERGE!!!!#34938
fabrizzio-dotCMS wants to merge 44 commits intomainfrom
issue-34933-ESMappingAPIImpl

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented Mar 10, 2026

Summary

Migrates ESMappingAPIImpl and the contentlet indexing pipeline to a vendor-neutral architecture as part of the ES→OS migration. Introduces PhaseRouter to route index operations to Elasticsearch or OpenSearch based on the active MigrationPhase, replacing the ES-coupled IndexAPIImpl with a new IndexManagerAPIImpl that delegates through the router.

Changes

New domain layer (vendor-neutral types)

  • Added IndexBulkRequest, IndexBulkProcessor, IndexBulkListener, IndexBulkItemResult — replaces ES BulkRequest, BulkProcessor, ActionListener, BulkItemResponse in all public APIs
  • Added ContentletIndexOperations — vendor-neutral interface for per-document index operations
  • Added ContentIndexMappingAPI — vendor-neutral mapping contract replacing ContentMappingAPI
  • Added IndexMappingRestOperations — vendor-neutral interface for putMapping/getMapping/getFieldMappingAsMap

New routing and implementation classes

  • Added PhaseRouter — phase-aware router delegating to ES or OS based on MigrationPhase
  • Added IndexManagerAPIImpl — new top-level ContentletIndexAPI implementation using PhaseRouter
  • Added ContentletIndexOperationsES / ContentletIndexOperationsOS — per-vendor implementations of ContentletIndexOperations
  • Added MappingOperationsES / MappingOperationsOS — per-vendor implementations of IndexMappingRestOperations

Refactored existing classes

  • ContentletIndexAPI — all ES-specific types (BulkRequest, ActionListener) replaced with IndexBulk* domain types; timestampFormatter changed from SimpleDateFormat to DateTimeFormatter
  • ContentletIndexAPIImpl — large rework to delegate via PhaseRouter
  • ESMappingAPIImpl — converted to a phase-aware router implementing ContentIndexMappingAPI, delegating mapping ops to MappingOperationsES
  • IndexRouter annotation — enum values renamed READ_ONLY→READ, READ_WRITE→WRITE
  • ReindexThreadBULK_PROCESSOR_AWAIT_TIMEOUT made public for use by ContentletIndexOperationsES
  • BulkProcessorListener — adapted to use IndexBulk* domain types

Deletions / dead code removal

  • Deleted IndexAPIImpl (replaced by IndexManagerAPIImpl)
  • Deleted ContentMappingAPI (replaced by ContentIndexMappingAPI)
  • Deleted CacheUtil (dead code)
  • Removed dead code from PermissionBitFactoryImpl, VersionedIndicesImpl

Wiring and minor fixes

  • APILocator, CacheLocator — wired to new implementations
  • ESSiteSearchAPI — fixed DateLocalDateTime for DateTimeFormatter compatibility
  • InitServlet, ReindexThread, IndexAjaxAction, SiteSearchAjaxAction, ESIndexResource — import/wiring updates

Testing

  • Added ContentletIndexOperationsOSIntegrationTest (480 lines) — full integration coverage for OS indexing operations
  • Added OSMappingAPIImplIntegrationTest (282 lines) — integration coverage for OS mapping API
  • Added OpenSearchUpgradeSuite — test suite for OS upgrade path
  • Updated ContentletIndexAPIImplTest and ESMappingAPITest to compile against vendor-neutral API

To verify locally:

just test-integration-ide   # Start PostgreSQL + Elasticsearch + dotCMS
./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false \
  -Dit.test=ContentletIndexOperationsOSIntegrationTest,OSMappingAPIImplIntegrationTest
just test-integration-stop

Breaking Changes

  • ContentletIndexAPI method signatures changed: BulkRequestIndexBulkRequest, BulkProcessorIndexBulkProcessor, ActionListener<BulkResponse> removed. Any code calling putToIndex(BulkRequest, ActionListener) or createBulkProcessor(BulkProcessorListener) must be updated.
  • ContentletIndexAPI.timestampFormatter type changed from SimpleDateFormat to DateTimeFormatter — callers using .format(Date) must switch to .format(LocalDateTime).
  • IndexAccess enum values renamed: READ_ONLYREAD, READ_WRITEWRITE.
  • IndexAPIImpl deleted — any CDI injection of that specific class will fail.
  • CacheUtil deleted — any callers must be updated.

fabrizzio-dotCMS and others added 10 commits March 11, 2026 20:41
- Add CONNECTION_RETRY_SLEEP_SECONDS property to OSIndexProperty to fix conflicting semantics with CONNECTION_TIMEOUT (HTTP ms vs retry seconds)
- Use CONNECTION_RETRY_SLEEP_SECONDS in OSIndexAPIImpl.waitUtilIndexReady()
- Fix IndexAPIImpl class Javadoc to reference FEATURE_FLAG_OPEN_SEARCH_PHASE and migration-phase-based routing instead of stale FEATURE_FLAG_OPEN_SEARCH_READ/WRITE flags
- Replace non-thread-safe SimpleDateFormat with DateTimeFormatter in VersionedIndicesAPIImpl (@ApplicationScoped bean)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4728

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @fabrizzio-dotCMS's task in 4m 15s —— View job


PR Review: ESMappingAPIImpl / ES→OS Migration

  • Gather context and read all key files
  • Analysis complete

Critical

initIndexOS() creates indices without base content mappingContentletIndexAPIImpl:489

// initIndexOS() does:
indexAPI.createIndex(workingName, 0);   // raw index — no content mapping applied
indexAPI.createIndex(liveName, 0);
// ...
ESMappingUtilHelper.getInstance().addCustomMapping(workingName, liveName);  // only custom overrides

indexAPI.createIndex routes to raw OSIndexAPIImpl.createIndex, which only creates the index structure. It does NOT call ContentletIndexOperationsOS.createContentIndex, which is the only path that applies es-content-mapping.json. ESMappingUtilHelper.addCustomMapping adds site-search overrides on top, but the base schema is never set. This means OS indices initialized in phases 2 and 3 (the isReadEnabled() and isMigrationComplete() branches) have no content field mappings, breaking search entirely on those indices. Compare with initIndexES() which correctly calls createContentIndex(...).

Fix this →


High

PhaseRouter.writeChecked / writeReturningChecked fail-fast, skipping the second providerPhaseRouter.java:210

public void writeChecked(final ThrowingConsumer<T> action) throws Exception {
    for (final T impl : writeProviders()) {
        action.accept(impl);   // throws → second provider never called
    }
}

In dual-write phases (1 and 2), if ES throws a transient error, the OS write is silently skipped. This creates inconsistency between the two stores with no compensation path. The unchecked write() has the same problem. The current approach means one bad ES shard leaves OS missing that document permanently (assuming no full reindex is triggered). At minimum the behaviour should be documented as a known limitation; ideally writes to provider 2 should still be attempted and any failures logged separately.

Async bulk processor (ReindexThread) only writes to read providerContentletIndexAPIImpl:966

public IndexBulkProcessor createBulkProcessor(final IndexBulkListener bulkListener) {
    // Always bound to the read provider — dual-write via processor not yet supported
    return router.readProvider().createBulkProcessor(bulkListener);
}

The comment acknowledges this is deferred, but the consequence is severe: a full reindex triggered while in phase 1 or 2 will only populate the read provider. If the phase advances to 2 mid-reindex (ES → OS reads), all the async-indexed documents are only in ES; OS serves them missing. There is no guard that prevents a phase transition while a bulk processor is live. Even if "tracked separately," this state should either be blocked by the phase transition logic or trigger an automatic fallback reindex.


Medium

OSIndexBulkProcessor has no byte-size limit and no retry backoffContentletIndexOperationsOS.java:128

The homegrown OSIndexBulkProcessor accumulates operations only by count (maxActions). The ES processor also limits by byte size (setBulkSize) and applies exponential backoff on transient failures (setBackoffPolicy). Under large binary fields or file metadata, the in-memory list can grow into hundreds of megabytes before flushing. On transient OS errors, the OS path fails permanently while ES would retry; this creates an asymmetric durability gap between the two stores in dual-write phases.

Empty document ID causes StringIndexOutOfBoundsException in BulkProcessorListenerBulkProcessorListener.java:77

: result.id().substring(0, result.id().indexOf(StringPool.UNDERLINE));

ContentletIndexOperationsOS.flush() guards against a null OS item ID with item.id() != null ? item.id() : "". When the fallback empty string reaches this line, "".indexOf("_") returns -1 and substring(0, -1) throws StringIndexOutOfBoundsException. This will cause the entire afterBulk callback to crash, leaving workingRecords uncleared and the workingRecords entries stuck until the next reindex cycle.

Fix this →


Low / Design

ESIndexBulkRequest cast is unchecked; ESIndexBulkProcessor cast is also uncheckedContentletIndexOperationsES.java:341

private static BulkRequest asBulkRequest(final IndexBulkRequest req) {
    return ((ESIndexBulkRequest) req).delegate;  // ClassCastException, no instanceof guard
}

The OS counterpart (ContentletIndexOperationsOS.java:392) explicitly checks instanceof and throws a meaningful DotRuntimeException. Both should be consistent. The ES version will produce a confusing ClassCastException stack trace if a non-ES request is accidentally passed.

DualIndexBulkRequest.size() double-counts document operationsContentletIndexAPIImpl.java:284

public int size() {
    return esReq.size() + osReq.size();
}

The same logical document produces two entries (one per provider). Any caller that uses size() to check whether a batch is empty (isEmpty()) will still work correctly, but callers that use the count for metrics, logging, or threshold decisions see twice the real number.

Phase flag read is hot — no cachingIndexConfigHelper.java:88

MigrationPhase.current() calls Config.getIntProperty(FLAG_KEY, 0) on every call to any PhaseRouter method. During a full bulk reindex, every addIndexOp, setRefreshPolicy, and putToIndex call invokes writeProviders() or readProvider(), each of which calls MigrationPhase.current(). The flag value does not change within a reindex operation. Consider caching the resolved phase in the PhaseRouter instance (with an explicit invalidation method) to avoid thousands of property lookups per second.

Stale Javadoc on ContentletIndexOperationsOSContentletIndexOperationsOS.java:54

The class Javadoc says: "Methods that have no OpenSearch equivalent yet are stubbed with UnsupportedOperationException." No such stubs exist in the current implementation. The comment should be removed to avoid misleading future readers into thinking there are safe no-op paths.

initIndex() in phase 1 creates duplicate OS index setsContentletIndexAPIImpl.java:438

In phase 1, initIndex() calls initIndexOS() (creates OS-named indices via VersionedIndicesAPI) and then calls initIndexES()createContentIndex which iterates writeProviders() (both ES + OS in phase 1) and creates additional OS-named indices using the ES-format index name. Two parallel OS index sets exist with different naming schemes, only one of which (VersionedIndicesAPI-tracked) is used for subsequent OS reads.


Summary

The routing architecture (PhaseRouter, ContentletIndexOperations) is clean and well-documented. The main concerns are (1) the missing content mapping in initIndexOS() — likely to cause broken search on a fresh install in phases 2/3; (2) the async processor's single-provider limitation combined with no phase-transition guard, which creates a practical data-gap during full reindex; and (3) the empty-ID crash in BulkProcessorListener. The rest are correctness concerns that should be addressed before the dual-write phases are activated in production.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-4 — OSGi Plugin API Breakage
  • Risk Level: 🟡 MEDIUM
  • Why it's unsafe: The ContentletIndexAPI public interface has had multiple breaking signature changes: Elasticsearch-specific types (BulkRequest, BulkResponse, BulkProcessor, ActionListener) have been replaced with vendor-neutral abstractions (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener, IndexBulkItemResult). The method checkAndInitialiazeIndex() was renamed to checkAndInitializeIndex(), getRidOfOldIndex() was removed, putToIndex(BulkRequest, ActionListener) was removed, and timestampFormatter changed type from SimpleDateFormat to DateTimeFormatter. In addition, ContentMappingAPI was deleted entirely and replaced by ContentIndexMappingAPI. Any OSGi plugin compiled against the old interface will fail to activate after rollback to N-1 if it was recompiled against N.
  • Code that makes it unsafe:
    • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPI.java — method getRidOfOldIndex() removed; checkAndInitialiazeIndex() renamed; putToIndex(BulkRequest, ActionListener<BulkResponse>) removed; createBulkRequest(), appendBulkRequest(), appendBulkRemoveRequest(), createBulkProcessor(BulkProcessorListener), appendToBulkProcessor(BulkProcessor, ...) all changed to use vendor-neutral types; timestampFormatter type changed from SimpleDateFormat to DateTimeFormatter
    • dotCMS/src/main/java/com/dotcms/content/business/ContentMappingAPI.java — interface deleted and replaced by ContentIndexMappingAPI
    • dotCMS/src/main/java/com/dotmarketing/common/reindex/BulkProcessorListener.java — now implements IndexBulkListener instead of BulkProcessor.Listener; method signatures beforeBulk and afterBulk changed
  • Alternative (if possible): Keep the old interface methods as @Deprecated default delegators so OSGi plugins compiled against N-1 can still activate after rollback. Introduce the new vendor-neutral types alongside the old ones for one release cycle, then remove the deprecated methods in the following release (two-phase removal pattern).

fabrizzio-dotCMS and others added 15 commits March 17, 2026 14:47
…uter

Replace all ElasticsearchException usages in ContentletIndexAPIImpl with
DotRuntimeException to satisfy the CodingStandardsArchTest rule that forbids
@IndexLibraryIndependent classes from depending on vendor-specific libraries.

- Remove throws ElasticsearchException from createContentIndex(String),
  createContentIndex(String,int), initIndex(), and fullReindexStart()
- Replace throw new ElasticsearchException(...) with DotRuntimeException(...)
  in initIndexES(), fullReindexStart(), and fullReindexAbort()
- No caller was catching ElasticsearchException specifically; all callers
  use generic catch(Exception) or propagate IOException — no behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fabrizzio-dotCMS added a commit that referenced this pull request Mar 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

fabrizzio-dotCMS added a commit that referenced this pull request Mar 25, 2026
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2026
…34938  (#35104)

## Summary

Fixes a bug where `ContentFactoryIndexOperationsOS` was creating a new
`OSQueryCache` instance on every instantiation instead of using the
shared singleton managed by `CacheLocator`. This caused the OS query
cache to be bypassed in production, defeating its purpose.

## Changes

**Bug fix**
- `ContentFactoryIndexOperationsOS` — constructor now fetches the cache
via `CacheLocator.getOSQueryCache()` instead of `new OSQueryCache()`,
ensuring all instances share the same cache

**Cache infrastructure**
- `CacheLocator` — added `getOSQueryCache()` accessor and registered
`OSQueryCache` in the `CacheIndex` enum, mirroring the existing
`ESQueryCache` pattern

**Cleanup**
- `CacheUtil.java` — deleted dead utility class (`Moveh2dbDir`) that
referenced `contentletIndexAPI.timestampFormatter` and was unused

## Testing

No new tests added. The fix follows the same pattern as `ESQueryCache`
which is already in use. To verify:
1. Enable OS migration (phase 1+)
2. Confirm repeated identical queries hit the cache (cache hit metrics
in Hazelcast/JMX)
3. Confirm `CacheLocator.getOSQueryCache()` returns the same instance
across calls

## Breaking Changes

**None**.

This PR fixes: #34938
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2026
## Summary

Introduces the vendor-neutral write-side abstraction layer for the ES →
OpenSearch migration. Extracts all bulk-write and mapping operations
from `ContentletIndexAPIImpl` into provider-specific implementations
behind a generic `PhaseRouter<T>`, completing the dual-write
infrastructure needed for migration phases 1–3.

## Changes

**New abstractions (vendor-neutral contracts)**
- `ContentletIndexOperations` — write-side interface: bulk requests,
bulk processor, index lifecycle, delete-by-content-type
- `IndexBulkRequest`, `IndexBulkProcessor`, `IndexBulkListener`,
`IndexBulkItemResult` — DTOs replacing ES/OS vendor types in public APIs
- `IndexMappingRestOperations` — mapping REST contract (`putMapping`,
`getMapping`, `getFieldMappingAsMap`)
- `PhaseRouter<T>` — generic, phase-aware dual-write router with
`read()`, `write()`, `writeBoolean()`, `writeReturning()` and
checked-exception variants

**Elasticsearch implementation**
- `ContentletIndexOperationsES` — wraps `BulkRequest`/`BulkProcessor`
behind `ContentletIndexOperations`
- `MappingOperationsES` — wraps ES `RestHighLevelClient` mapping calls
behind `IndexMappingRestOperations`

**OpenSearch implementation**
- `ContentletIndexOperationsOS` — `opensearch-java` 3.x bulk write
implementation with manual `OSIndexBulkProcessor` (no built-in
BulkProcessor in that SDK)
- `MappingOperationsOS` — OpenSearch mapping REST calls

**Minor changes**
- `ContentMappingAPI` — expanded with `putMapping`, `getMapping`,
`toJson`, `toMap`, `toJsonString` methods
- `@IndexRouter.access` — changed from single enum value to array
(`{IndexAccess.READ}`, `{IndexAccess.READ, IndexAccess.WRITE}`)
- `IndexConfigHelper` — added `isMigrationStarted()` convenience
predicate

## Testing

- `ContentletIndexOperationsOSIntegrationTest` (480 lines) added to
`OpenSearchUpgradeSuite`
- Tests cover: index creation, bulk index/delete, bulk processor flush,
delete-by-content-type, document count
- To run locally: `./mvnw verify -pl :dotcms-integration
-Dcoreit.test.skip=false
-Dit.test=ContentletIndexOperationsOSIntegrationTest`

This PR fixes: #34938
@fabrizzio-dotCMS fabrizzio-dotCMS changed the title task(Migrate ESMappingAPIImpl) Refs:34933 DO NOT MERGE!!!! Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Not Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant