Conversation
- Add CONNECTION_RETRY_SLEEP_SECONDS property to OSIndexProperty to fix conflicting semantics with CONNECTION_TIMEOUT (HTTP ms vs retry seconds) - Use CONNECTION_RETRY_SLEEP_SECONDS in OSIndexAPIImpl.waitUtilIndexReady() - Fix IndexAPIImpl class Javadoc to reference FEATURE_FLAG_OPEN_SEARCH_PHASE and migration-phase-based routing instead of stale FEATURE_FLAG_OPEN_SEARCH_READ/WRITE flags - Replace non-thread-safe SimpleDateFormat with DateTimeFormatter in VersionedIndicesAPIImpl (@ApplicationScoped bean) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4728 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CMS/core into issue-34933-ESMappingAPIImpl
|
Claude finished @fabrizzio-dotCMS's task in 4m 15s —— View job PR Review: ESMappingAPIImpl / ES→OS Migration
Critical
// initIndexOS() does:
indexAPI.createIndex(workingName, 0); // raw index — no content mapping applied
indexAPI.createIndex(liveName, 0);
// ...
ESMappingUtilHelper.getInstance().addCustomMapping(workingName, liveName); // only custom overrides
High
public void writeChecked(final ThrowingConsumer<T> action) throws Exception {
for (final T impl : writeProviders()) {
action.accept(impl); // throws → second provider never called
}
}In dual-write phases (1 and 2), if ES throws a transient error, the OS write is silently skipped. This creates inconsistency between the two stores with no compensation path. The unchecked Async bulk processor (ReindexThread) only writes to read provider — public IndexBulkProcessor createBulkProcessor(final IndexBulkListener bulkListener) {
// Always bound to the read provider — dual-write via processor not yet supported
return router.readProvider().createBulkProcessor(bulkListener);
}The comment acknowledges this is deferred, but the consequence is severe: a full reindex triggered while in phase 1 or 2 will only populate the read provider. If the phase advances to 2 mid-reindex (ES → OS reads), all the async-indexed documents are only in ES; OS serves them missing. There is no guard that prevents a phase transition while a bulk processor is live. Even if "tracked separately," this state should either be blocked by the phase transition logic or trigger an automatic fallback reindex. Medium
The homegrown Empty document ID causes : result.id().substring(0, result.id().indexOf(StringPool.UNDERLINE));
Low / Design
private static BulkRequest asBulkRequest(final IndexBulkRequest req) {
return ((ESIndexBulkRequest) req).delegate; // ClassCastException, no instanceof guard
}The OS counterpart (
public int size() {
return esReq.size() + osReq.size();
}The same logical document produces two entries (one per provider). Any caller that uses Phase flag read is hot — no caching —
Stale Javadoc on The class Javadoc says: "Methods that have no OpenSearch equivalent yet are stubbed with
In phase 1, SummaryThe routing architecture ( |
|
Pull Request Unsafe to Rollback!!!
|
…uter Replace all ElasticsearchException usages in ContentletIndexAPIImpl with DotRuntimeException to satisfy the CodingStandardsArchTest rule that forbids @IndexLibraryIndependent classes from depending on vendor-specific libraries. - Remove throws ElasticsearchException from createContentIndex(String), createContentIndex(String,int), initIndex(), and fullReindexStart() - Replace throw new ElasticsearchException(...) with DotRuntimeException(...) in initIndexES(), fullReindexStart(), and fullReindexAbort() - No caller was catching ElasticsearchException specifically; all callers use generic catch(Exception) or propagate IOException — no behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CMS/core into issue-34933-ESMappingAPIImpl
PRs linked to this issuePRs linked to this issue |
…34938 (#35104) ## Summary Fixes a bug where `ContentFactoryIndexOperationsOS` was creating a new `OSQueryCache` instance on every instantiation instead of using the shared singleton managed by `CacheLocator`. This caused the OS query cache to be bypassed in production, defeating its purpose. ## Changes **Bug fix** - `ContentFactoryIndexOperationsOS` — constructor now fetches the cache via `CacheLocator.getOSQueryCache()` instead of `new OSQueryCache()`, ensuring all instances share the same cache **Cache infrastructure** - `CacheLocator` — added `getOSQueryCache()` accessor and registered `OSQueryCache` in the `CacheIndex` enum, mirroring the existing `ESQueryCache` pattern **Cleanup** - `CacheUtil.java` — deleted dead utility class (`Moveh2dbDir`) that referenced `contentletIndexAPI.timestampFormatter` and was unused ## Testing No new tests added. The fix follows the same pattern as `ESQueryCache` which is already in use. To verify: 1. Enable OS migration (phase 1+) 2. Confirm repeated identical queries hit the cache (cache hit metrics in Hazelcast/JMX) 3. Confirm `CacheLocator.getOSQueryCache()` returns the same instance across calls ## Breaking Changes **None**. This PR fixes: #34938
## Summary
Introduces the vendor-neutral write-side abstraction layer for the ES →
OpenSearch migration. Extracts all bulk-write and mapping operations
from `ContentletIndexAPIImpl` into provider-specific implementations
behind a generic `PhaseRouter<T>`, completing the dual-write
infrastructure needed for migration phases 1–3.
## Changes
**New abstractions (vendor-neutral contracts)**
- `ContentletIndexOperations` — write-side interface: bulk requests,
bulk processor, index lifecycle, delete-by-content-type
- `IndexBulkRequest`, `IndexBulkProcessor`, `IndexBulkListener`,
`IndexBulkItemResult` — DTOs replacing ES/OS vendor types in public APIs
- `IndexMappingRestOperations` — mapping REST contract (`putMapping`,
`getMapping`, `getFieldMappingAsMap`)
- `PhaseRouter<T>` — generic, phase-aware dual-write router with
`read()`, `write()`, `writeBoolean()`, `writeReturning()` and
checked-exception variants
**Elasticsearch implementation**
- `ContentletIndexOperationsES` — wraps `BulkRequest`/`BulkProcessor`
behind `ContentletIndexOperations`
- `MappingOperationsES` — wraps ES `RestHighLevelClient` mapping calls
behind `IndexMappingRestOperations`
**OpenSearch implementation**
- `ContentletIndexOperationsOS` — `opensearch-java` 3.x bulk write
implementation with manual `OSIndexBulkProcessor` (no built-in
BulkProcessor in that SDK)
- `MappingOperationsOS` — OpenSearch mapping REST calls
**Minor changes**
- `ContentMappingAPI` — expanded with `putMapping`, `getMapping`,
`toJson`, `toMap`, `toJsonString` methods
- `@IndexRouter.access` — changed from single enum value to array
(`{IndexAccess.READ}`, `{IndexAccess.READ, IndexAccess.WRITE}`)
- `IndexConfigHelper` — added `isMigrationStarted()` convenience
predicate
## Testing
- `ContentletIndexOperationsOSIntegrationTest` (480 lines) added to
`OpenSearchUpgradeSuite`
- Tests cover: index creation, bulk index/delete, bulk processor flush,
delete-by-content-type, document count
- To run locally: `./mvnw verify -pl :dotcms-integration
-Dcoreit.test.skip=false
-Dit.test=ContentletIndexOperationsOSIntegrationTest`
This PR fixes: #34938
Summary
Migrates
ESMappingAPIImpland the contentlet indexing pipeline to a vendor-neutral architecture as part of the ES→OS migration. IntroducesPhaseRouterto route index operations to Elasticsearch or OpenSearch based on the activeMigrationPhase, replacing the ES-coupledIndexAPIImplwith a newIndexManagerAPIImplthat delegates through the router.Changes
New domain layer (vendor-neutral types)
IndexBulkRequest,IndexBulkProcessor,IndexBulkListener,IndexBulkItemResult— replaces ESBulkRequest,BulkProcessor,ActionListener,BulkItemResponsein all public APIsContentletIndexOperations— vendor-neutral interface for per-document index operationsContentIndexMappingAPI— vendor-neutral mapping contract replacingContentMappingAPIIndexMappingRestOperations— vendor-neutral interface forputMapping/getMapping/getFieldMappingAsMapNew routing and implementation classes
PhaseRouter— phase-aware router delegating to ES or OS based onMigrationPhaseIndexManagerAPIImpl— new top-levelContentletIndexAPIimplementation usingPhaseRouterContentletIndexOperationsES/ContentletIndexOperationsOS— per-vendor implementations ofContentletIndexOperationsMappingOperationsES/MappingOperationsOS— per-vendor implementations ofIndexMappingRestOperationsRefactored existing classes
ContentletIndexAPI— all ES-specific types (BulkRequest,ActionListener) replaced withIndexBulk*domain types;timestampFormatterchanged fromSimpleDateFormattoDateTimeFormatterContentletIndexAPIImpl— large rework to delegate viaPhaseRouterESMappingAPIImpl— converted to a phase-aware router implementingContentIndexMappingAPI, delegating mapping ops toMappingOperationsESIndexRouterannotation — enum values renamedREAD_ONLY→READ,READ_WRITE→WRITEReindexThread—BULK_PROCESSOR_AWAIT_TIMEOUTmadepublicfor use byContentletIndexOperationsESBulkProcessorListener— adapted to useIndexBulk*domain typesDeletions / dead code removal
IndexAPIImpl(replaced byIndexManagerAPIImpl)ContentMappingAPI(replaced byContentIndexMappingAPI)CacheUtil(dead code)PermissionBitFactoryImpl,VersionedIndicesImplWiring and minor fixes
APILocator,CacheLocator— wired to new implementationsESSiteSearchAPI— fixedDate→LocalDateTimeforDateTimeFormattercompatibilityInitServlet,ReindexThread,IndexAjaxAction,SiteSearchAjaxAction,ESIndexResource— import/wiring updatesTesting
ContentletIndexOperationsOSIntegrationTest(480 lines) — full integration coverage for OS indexing operationsOSMappingAPIImplIntegrationTest(282 lines) — integration coverage for OS mapping APIOpenSearchUpgradeSuite— test suite for OS upgrade pathContentletIndexAPIImplTestandESMappingAPITestto compile against vendor-neutral APITo verify locally:
just test-integration-ide # Start PostgreSQL + Elasticsearch + dotCMS ./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false \ -Dit.test=ContentletIndexOperationsOSIntegrationTest,OSMappingAPIImplIntegrationTest just test-integration-stopBreaking Changes
ContentletIndexAPImethod signatures changed:BulkRequest→IndexBulkRequest,BulkProcessor→IndexBulkProcessor,ActionListener<BulkResponse>removed. Any code callingputToIndex(BulkRequest, ActionListener)orcreateBulkProcessor(BulkProcessorListener)must be updated.ContentletIndexAPI.timestampFormattertype changed fromSimpleDateFormattoDateTimeFormatter— callers using.format(Date)must switch to.format(LocalDateTime).IndexAccessenum values renamed:READ_ONLY→READ,READ_WRITE→WRITE.IndexAPIImpldeleted — any CDI injection of that specific class will fail.CacheUtildeleted — any callers must be updated.