Skip to content

task(ESMappingApiImpl integration) Refs:34933 #35130

Draft
fabrizzio-dotCMS wants to merge 6 commits intomainfrom
issue-34933-ESMappingAPIImpl-integration-2
Draft

task(ESMappingApiImpl integration) Refs:34933 #35130
fabrizzio-dotCMS wants to merge 6 commits intomainfrom
issue-34933-ESMappingAPIImpl-integration-2

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented Mar 26, 2026

Summary

Refactors ContentletIndexAPI and ContentletIndexAPIImpl to be vendor-neutral, removing all Elasticsearch SDK types (BulkRequest, BulkProcessor, BulkResponse) from method signatures and replacing them with library-agnostic domain types (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener). ContentletIndexAPIImpl is reimplemented as a phase-aware router that delegates to both ES and OS backends based on the current migration phase, enabling dual-write during the ES → OpenSearch migration.

Changes

Backend — ContentletIndexAPI interface

  • Removed all org.elasticsearch.* imports from the public interface
  • Replaced BulkRequest/BulkProcessor/BulkResponse/ActionListener with neutral domain types: IndexBulkRequest, IndexBulkProcessor, IndexBulkListener
  • Changed fullReindexStart() return type from String to InitIndexInfo (carries timestamp + index names)
  • Added thread-safe DateTimeFormatter threadSafeTimestampFormatter and deprecated SimpleDateFormat timestampFormatter (forRemoval = true)
  • Annotated with @IndexLibraryIndependent

Backend — ContentletIndexAPIImpl (major refactor)

  • Replaced 20+ ES-specific imports with vendor-neutral equivalents
  • Introduced PhaseRouter<ContentletIndexOperations> + operationsES / operationsOS fields for phase-aware routing:
    Phase Read Write
    0 — not started ES [ES]
    1 — dual-write, ES reads ES [ES, OS]
    2 — dual-write, OS reads OS [ES, OS]
    3 — OS only OS [OS]
  • Added package-private constructor accepting operationsES and operationsOS for unit testing
  • Fixed circular dependency (ContentletIndexAPIImpl → ESMappingAPIImpl → ... → ContentletIndexAPIImpl) via lazy AtomicReference<ContentMappingAPI>
  • Added loadProviderIndices() to resolve ES/OS physical index names (ES from IndiciesAPI, OS from VersionedIndicesAPI)
  • Added IndexStartupValidator — validates index state at startup

Backend — supporting classes

  • BulkProcessorListener → adapted to IndexBulkRequest/IndexBulkProcessor
  • ReindexThread → updated callers to neutral API
  • IndexAPIImpl → minor correctness fixes
  • PermissionBitFactoryImpl → removed 10 lines of dead code
  • PhaseRouter → minor additions; InitIndexInfo / ConfigurableOpenSearchProvider / OSIndexAPIImpl → minor fixes

Infrastructure

  • New docker/docker-compose-examples/single-node-os-migration/docker-compose.yml — runs ES 7.9.1 + OpenSearch 3.4.0 side-by-side with dotCMS for local migration validation

Tests

  • ContentletIndexAPIImplTest and ESSiteSearchAPITest updated to compile against the new neutral API signatures (no new test coverage added)

Testing

No new integration tests are added in this PR (they live in the companion PR #35133).

To verify locally:

# 1. Spin up the dual-backend stack
docker compose -f docker/docker-compose-examples/single-node-os-migration/docker-compose.yml up -d

# 2. Build
./mvnw install -pl :dotcms-core -DskipTests

# 3. Run existing integration tests that exercise the reindex path
./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false \
  -Dit.test=ContentletIndexAPIImplTest

Confirm that:

  • ContentletIndexAPIImpl instantiates without circular dependency errors
  • Reindex completes successfully against Elasticsearch (phase 0)
  • No ES SDK types appear in stack traces or logs from the ContentletIndexAPI layer

Breaking Changes

ContentletIndexAPI method signatures changed:

Method Before After
fullReindexStart() String InitIndexInfo
putToIndex(...) BulkRequest, ActionListener<BulkResponse> IndexBulkRequest
createBulkRequest(...) BulkRequest IndexBulkRequest
appendBulkRequest(...) BulkRequest IndexBulkRequest
appendBulkRemoveRequest(...) BulkRequest IndexBulkRequest
createBulkProcessor(...) BulkProcessor IndexBulkProcessor
appendToBulkProcessor(...) BulkProcessor IndexBulkProcessor

Any class directly implementing or calling these methods outside of ContentletIndexAPIImpl must be updated to use the neutral domain types.

This PR fixes: #34933

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 26, 2026

Claude finished @fabrizzio-dotCMS's task in 5m 26s —— View job


PR Review

  • Read changed files
  • Review ContentletIndexAPI interface changes
  • Review ContentletIndexAPIImpl implementation
  • Review BulkProcessorListener changes
  • Review other supporting files
  • Post final review

Critical Issues

1. OS never switches over after full reindex

fullReindexSwitchover (ContentletIndexAPIImpl.java:771) only updates IndiciesInfo (ES store). VersionedIndicesAPI is never updated with the reindex working/live slots during switchover. In phases 1 or 2, after fullReindexStart() writes both ES and OS reindex indices, the switchover at line 800–810 only points ES to the new data. OS still reads from the pre-reindex working/live forever — the reindex data written to the OS provider is orphaned.

The same pattern exists in fullReindexAbort (line 1778) and isInFullReindex (line 752–757) — both only check/clean up ES. If an OS-only reindex scenario arises in phase 3, isInFullReindex() will return false even if reindex is active because it only reads IndiciesInfo.getReindexWorking().

Fix this →


2. getIndexDocumentCount applies ES cluster prefix to OS index names

ContentletIndexAPIImpl.java:1875–1878:

return router.readProvider().getIndexDocumentCount(
        indexAPI.getNameWithClusterIDPrefix(indexName));

getNameWithClusterIDPrefix prepends cluster_{id}. — an ES-only concept. In phases 2/3 when router.readProvider() returns the OS provider, this passes a mangled name like cluster_abc.working_20240101 to OS instead of the plain working_20240101. This is called from isOsWorkingIndexEmpty() (line 432) with an OS index name from VersionedIndices, causing the doc-count check to silently fail and potentially trigger spurious full reindexes on every startup.

Fix this →


Significant Issues

3. Async reindex (ReindexThread/BulkProcessor) only writes to read provider

createBulkProcessor (line 1102) always uses router.readProvider(), not writeProviders(). In phase 1 (dual-write, ES reads), ReindexThread's async path via appendToBulkProcessor only indexes to ES. OS never receives this data through the async path. The comment says this is "intentionally deferred," but combined with issue #1 (no OS switchover), content reindexed through the async path won't appear in OS at all. This gap needs a tracking issue if not fixed here.


4. PermissionBitFactoryImpl drops timeout without replacement

The removed code (db3d2152) had:

BulkRequest bulkRequest = indexAPI.createBulkRequest();
bulkRequest.timeout(TimeValue.timeValueMillis(INDEX_OPERATIONS_TIMEOUT_IN_MS));

The replacement uses addContentToIndex(cont, false) per contentlet with no timeout. If ES/OS hangs on a bulk operation, the permission sync loop will block indefinitely with no timeout protection. INDEX_OPERATIONS_TIMEOUT_IN_MS was important there.


Medium Issues

5. DualIndexBulkRequest.size() double-counts

line 307: return esReq.size() + osReq.size(). A single document indexed to both ES and OS in a dual-write phase reports size() == 2. Any caller using size() for threshold decisions (e.g., "submit if size > N") will see inflated counts. Consider returning Math.max(esReq.size(), osReq.size()) if the semantic intent is "how many documents are in this batch."


6. OS version hardcoded to "3." in IndexStartupValidator

line 36: private static final String REQUIRED_OS_MAJOR = "3.". This is not configurable. OpenSearch 4.x would fail this check at startup, and OS 2.x (which some environments still run) is also blocked. This should be driven by a config property with 3. as a safe default, not a constant.


7. Silent default port change: 9200 → 9201

ConfigurableOpenSearchProvider.java: default port changed from 9200 to 9201. Any installation where OS runs on the default port 9200 and OS_PORT is not explicitly set will silently try the wrong port after upgrade. This needs a migration note or a more obvious config name to force explicit opt-in.


Minor Issues

8. Stale javadoc reference

ContentletIndexAPIImpl.java:125 references appendBulkRequest(IndexBulkProcessor, ReindexEntry) — this method doesn't exist in the interface or the class. The actual private method is appendBulkRequestToProcessor.

9. IndexStartupValidator instantiated on every checkAndInitialiazeIndex call

line 378: new IndexStartupValidator(CDIUtils.getBeanThrows(...)).validate() is called inside a synchronized method on every startup check. The validator opens a live connection to the OS cluster. Consider extracting this to a one-time startup check.

10. OSGi rollback risk remains

As noted in the prior comment — ContentletIndexAPI has 9 breaking signature changes. The two-phase deprecation approach (keep old signatures as @Deprecated delegates in N, remove in N+1) was suggested and remains valid. If any OSGi plugins in the wild implement or call these methods, N-1 rollback will produce NoSuchMethodError.


Summary

The vendor-neutral abstraction pattern (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener) is well-designed, and the PhaseRouter / DualIndexBulkRequest routing logic is clean. The critical concern is that the full reindex switchover and getIndexDocumentCount don't account for OS, making dual-write phases (1 and 2) functionally incomplete for OS. Those two issues should block merge.

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Mar 26, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 27, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-4 — OSGi Plugin API Breakage
  • Risk Level: 🟡 MEDIUM
  • Why it's unsafe: The ContentletIndexAPI public interface has 9 breaking method signature changes that replace Elasticsearch-native types (BulkRequest, BulkResponse, ActionListener, BulkProcessor, BulkProcessorListener) with new vendor-neutral abstractions (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener). One method was completely removed. Any OSGi plugin that implements or calls methods on ContentletIndexAPI using the old ES-specific types will encounter NoSuchMethodError or ClassCastException when running on N-1 after rollback, since N-1's interface contract no longer matches the compiled plugin.
  • Code that makes it unsafe:
    • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPI.java
      • Removed entirely: void putToIndex(BulkRequest bulkRequest, ActionListener<BulkResponse> listener)
      • Param/return type changed: BulkRequest createBulkRequest()IndexBulkRequest createBulkRequest()
      • Param/return type changed: BulkRequest createBulkRequest(List<Contentlet>)IndexBulkRequest createBulkRequest(List<Contentlet>)
      • Param/return type changed: BulkRequest appendBulkRequest(BulkRequest, Collection<ReindexEntry>)IndexBulkRequest appendBulkRequest(IndexBulkRequest, Collection<ReindexEntry>)
      • Param/return type changed: BulkRequest appendBulkRequest(BulkRequest, ReindexEntry)IndexBulkRequest appendBulkRequest(IndexBulkRequest, ReindexEntry)
      • Param/return type changed: void putToIndex(BulkRequest)void putToIndex(IndexBulkRequest)
      • Param/return type changed: BulkRequest appendBulkRemoveRequest(BulkRequest, ReindexEntry)IndexBulkRequest appendBulkRemoveRequest(IndexBulkRequest, ReindexEntry)
      • Param/return type changed: BulkProcessor createBulkProcessor(BulkProcessorListener)IndexBulkProcessor createBulkProcessor(IndexBulkListener)
      • Param type changed: void appendToBulkProcessor(BulkProcessor, Collection<ReindexEntry>)void appendToBulkProcessor(IndexBulkProcessor, Collection<ReindexEntry>)
  • Alternative (if possible): Follow the two-phase interface evolution pattern: in Release N, keep the old method signatures as @Deprecated delegating to the new abstractions, so N-1 plugins still resolve them at runtime. Remove the deprecated signatures in Release N+1 once N-2 is outside the rollback window.

github-merge-queue bot pushed a commit that referenced this pull request Mar 30, 2026
## Summary

Registers `OpenSearchUpgradeSuite` as a dedicated CI test matrix entry
so it runs automatically with the OpenSearch 3.x upgrade container
activated. Also includes minor code-quality cleanup in
`ESContentletAPIImpl`. This is the CI companion to #35130,
which contains the actual test suite implementation.

## Changes

**CI — `.github/test-matrix.yml`**
- Added `Integration Tests - OpenSearch Upgrade Suite` suite entry under
`integration`
- Overrides `base_maven_args` with `-Dopensearch.upgrade.test=true`,
which activates the `opensearch-upgrade` Maven profile — that profile
starts the OpenSearch 3.x Docker container on port 9201 and restricts
Failsafe to run only `OpenSearchUpgradeSuite`
- Fixed a trailing whitespace in the existing `base_maven_args` block

**Backend — `ESContentletAPIImpl`**
- `categories.size() < 1` → `categories.isEmpty()`
- `StringBuffer` → `StringBuilder` (non-synchronized context)
- String concatenation inside loop → `StringBuilder.append()` chain

## Testing

The suite itself (`OpenSearchUpgradeSuite.java` and its member tests)
lives in the companion PR #35130 and must be merged before
this CI entry is exercisable end-to-end.

To verify the matrix entry locally:

```bash
# Requires the opensearch-upgrade profile in dotcms-integration/pom.xml
./mvnw verify -pl :dotcms-integration \
  -Dcoreit.test.skip=false \
  -Dopensearch.upgrade.test=true
```

The profile starts an OpenSearch 3.x container on port 9201 and limits
Failsafe to `OpenSearchUpgradeSuite`.

## Breaking Changes

**None.**

This PR fixes: #35131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Not Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[TASK] Migrate ESMappingAPIImpl

1 participant