Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes the SearchIndexing application distributed-only and enforces staged index writes with alias promotion, removing legacy single-server classes and the recreateIndex / useDistributedIndexing configuration toggles across backend, UI schemas, docs, and scripts.
Changes:
- Remove legacy mode flags (
recreateIndex,useDistributedIndexing) from schemas/config handling and sanitize persisted app/run configs. - Delete single-server indexing pipeline/strategy code and related tests; route all reindexing through distributed staged-index flow.
- Add helper utilities for entity-type normalization, distributed stats mapping, and staged finalization/promotion.
Reviewed changes
Copilot reviewed 60 out of 70 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/utils/ApplicationSchemas/SearchIndexingApplication.json | Removes legacy mode toggles from UI schema and updates language field description. |
| openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/workflow.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/applicationPipeline.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/application.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/entity/services/ingestionPipelines/ingestionPipeline.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/marketplace/createAppMarketPlaceDefinitionReq.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/marketplace/appMarketPlaceDefinition.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/configuration/internal/searchIndexingAppConfig.ts | Regenerates SearchIndexing app config type to remove legacy options and update docs. |
| openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/app.ts | Regenerates types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/src/generated/api/services/ingestionPipelines/createIngestionPipeline.ts | Regenerates API types to drop legacy indexing config field. |
| openmetadata-ui/src/main/resources/ui/public/locales/en-US/Applications/SearchIndexingApplication.md | Removes legacy option docs and updates wording to staged promotion. |
| openmetadata-spec/src/main/resources/json/schema/entity/applications/configuration/internal/searchIndexingAppConfig.json | Removes legacy mode options from the SearchIndexing app JSON schema. |
| openmetadata-service/src/test/java/org/openmetadata/service/cache/EntityCacheBypassTest.java | Updates test docstring to reflect removal of single-server executor path. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/logging/AppRunLogAppenderTest.java | Updates logger name expectation after executor/orchestrator refactor. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SingleServerIndexingStrategyTest.java | Deletes tests for removed single-server strategy. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexStatsTest.java | Deletes tests tied to removed single-server stats/executor implementation. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexFailureScenarioTest.java | Deletes tests tied to removed single-server failure scenarios. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexEndToEndTest.java | Deletes end-to-end test targeting removed executor flow. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestratorTest.java | Updates orchestrator tests for distributed-only behavior + adds legacy-config sanitization assertion. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzOrchestratorContextTest.java | Updates tests for new createReindexingContext() signature. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzJobContextTest.java | Updates Quartz job context tests after removing distributed flag. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/SlackProgressListenerTest.java | Updates Slack listener config details to staged promotion wording. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/QuartzProgressListenerTest.java | Updates listener test config builder after removing legacy flags. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/LoggingProgressListenerTest.java | Updates logging listener to report staged promotion rather than recreate/distributed toggles. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/IndexingPipelineTest.java | Deletes tests for removed single-server pipeline. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReaderRetryTest.java | Deletes tests for removed single-server reader. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReaderLifecycleTest.java | Deletes tests for removed single-server reader lifecycle behavior. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityBatchSizeEstimatorTest.java | Deletes tests for removed batch size estimator. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategyTest.java | Updates distributed strategy tests for staged-index context and new executor signature. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorkerTest.java | Updates tests for staged index context being mandatory and always writing staged. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutorTest.java | Updates executor tests for required staged index context and promotion handler renames. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobParticipantTest.java | Updates participant tests to include staged index mapping on jobs. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobContextTest.java | Updates context tests after removing isDistributed(). |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/CompositeProgressListenerTest.java | Updates test context stub after removing isDistributed(). |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/AdaptiveBackoffTest.java | Deletes tests for removed adaptive backoff. |
| openmetadata-service/src/main/resources/json/data/appMarketPlaceDefinition/SearchIndexingApplication.json | Updates marketplace definition wording and removes legacy default config flag. |
| openmetadata-service/src/main/resources/json/data/app/SearchIndexingApplication.json | Removes legacy default config flag from installed app config. |
| openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/ReindexingUtil.java | Updates import to share time-series entity set via new helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SingleServerIndexingStrategy.java | Deletes removed single-server strategy implementation. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexEntityTypes.java | Adds centralized entity-type constants and normalization helpers. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexAppConfigSanitizer.java | Adds runtime config sanitization to strip removed legacy keys. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java | Sanitizes app config on init/validation and unifies distributed job status handling. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingProgressListener.java | Updates callback documentation to staged-index terminology. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestrator.java | Refactors orchestration to always run distributed strategy and sanitize configs. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingJobContext.java | Removes isDistributed() from job context API. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingConfiguration.java | Removes legacy mode flags from runtime configuration model and builders. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzOrchestratorContext.java | Updates context factory signature for distributed-only mode. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzJobContext.java | Removes distributed flag from Quartz job context. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OrphanedIndexCleaner.java | Updates comments to reflect staged reindexing behavior. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OrchestratorContext.java | Updates orchestrator context interface for new job context factory signature. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/SlackProgressListener.java | Reports staged-promotion mode instead of recreate/distributed toggles. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/LoggingProgressListener.java | Reports staged-promotion mode and removes distributed-mode logging. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/IndexingStrategy.java | Deletes obsolete strategy abstraction. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReader.java | Deletes removed single-server reader. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityBatchSizeEstimator.java | Deletes removed batch sizing helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedReindexStatsMapper.java | Extracts distributed job→Stats mapping logic into a dedicated helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedReindexFinalizer.java | Extracts staged index finalization/promotion logic into a dedicated helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategy.java | Enforces staged index preparation and simplified distributed executor invocation. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java | Requires staged index context and always writes to staged target indexes. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java | Switches time-series detection to shared helper and removes duplicated constants. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java | Makes staged index context mandatory and standardizes promotion handler naming. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexCoordinator.java | Updates precompute logic to use shared time-series classification helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobParticipant.java | Requires staged index mapping for participation and reconstructs staged context from it. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobContext.java | Removes isDistributed() from distributed job context. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DISTRIBUTED_INDEXING.md | Updates documentation to reflect distributed-only staged promotion and removes legacy flags. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/AdaptiveBackoff.java | Deletes removed backoff utility. |
| bin/distributed-test/scripts/trigger-reindex.sh | Removes legacy flags and updates request payload to distributed-only mode. |
Comments suppressed due to low confidence (1)
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java:252
SearchIndexEntityTypes.isTimeSeriesEntity(entityType)normalizes legacyqueryCostResulttoqueryCostRecord, butgetTimeSeriesEntityCount()still uses the unnormalizedentityTypeforEntity.getEntityTimeSeriesRepository(entityType)andreindexConfig.getTimeSeriesStartTs(entityType). If a legacy config/job includesqueryCostResult, this path will throwEntityNotFoundExceptionand silently return 0 due to the outer catch.
Normalize entityType once (e.g., String normalized = SearchIndexEntityTypes.normalizeEntityType(entityType)) and use the normalized value consistently for repo lookups, time-window lookups, and hashing/logging.
public long getEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
try {
long count;
if (SearchIndexEntityTypes.isTimeSeriesEntity(entityType)) {
count = getTimeSeriesEntityCount(entityType, reindexConfig);
} else {
count = getRegularEntityCount(entityType);
}
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 61 out of 71 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java:275
SearchIndexEntityTypes.isTimeSeriesEntity()normalizes legacy names (e.g.queryCostResult→queryCostRecord), butPartitionCalculator.getEntityCount()continues using the originalentityTypefor repository lookups. This can lead to a type being classified as time-series yet still throwingEntityNotFoundExceptioninEntity.getEntityTimeSeriesRepository(entityType)(caught and returning 0), silently skipping partitions/counts. NormalizeentityTypeonce at the start ofgetEntityCount()/getTimeSeriesEntityCount()(and use the normalized value consistently for filters/repository lookup).
public long getEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
try {
long count;
if (SearchIndexEntityTypes.isTimeSeriesEntity(entityType)) {
count = getTimeSeriesEntityCount(entityType, reindexConfig);
} else {
count = getRegularEntityCount(entityType);
}
LOG.debug("Entity count for {}: {}", entityType, count);
return count;
} catch (Exception e) {
LOG.error("Failed to get entity count for type: {} - returning 0", entityType, e);
return 0;
}
}
private long getRegularEntityCount(String entityType) {
EntityRepository<?> repository = Entity.getEntityRepository(entityType);
return repository.getDao().listCount(new ListFilter(Include.ALL));
}
private long getTimeSeriesEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
ListFilter listFilter = new ListFilter(Include.ALL);
EntityTimeSeriesRepository<?> repository;
if (SearchIndexEntityTypes.isDataInsightEntity(entityType)) {
listFilter.addQueryParam("entityFQNHash", FullyQualifiedName.buildHash(entityType));
repository = Entity.getEntityTimeSeriesRepository(Entity.ENTITY_REPORT_DATA);
} else {
repository = Entity.getEntityTimeSeriesRepository(entityType);
}
| if (entityStats == null || entityStats.isEmpty()) { | ||
| return false; | ||
| } | ||
| SearchIndexJob.EntityTypeStats stats = entityStats.get(entityType); | ||
| if (stats == null) { | ||
| return false; |
| private Map<String, SearchIndexJob.EntityTypeStats> getFinalEntityStats() { | ||
| if (distributedExecutor == null) { | ||
| return Collections.emptyMap(); | ||
| } | ||
| SearchIndexJob finalJob = distributedExecutor.getJobWithFreshStats(); | ||
| return finalJob != null && finalJob.getEntityStats() != null | ||
| ? finalJob.getEntityStats() | ||
| : Collections.emptyMap(); |
Code Review ✅ Approved 2 resolved / 2 findingsRefactors SearchIndexing to a distributed-only model by removing legacy single-server pipelines and cleaning up entity mapping logic. All prior findings regarding success calculations and magic strings have been resolved. ✅ 2 resolved✅ Edge Case: computeEntitySuccess returns true when stats are missing for entity
✅ Quality: Magic string "all" instead of SearchIndexEntityTypes.ALL
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
🔴 Playwright Results — 2 failure(s), 9 flaky✅ 4001 passed · ❌ 2 failed · 🟡 9 flaky · ⏭️ 86 skipped
Genuine Failures (failed on all attempts)❌
|



Describe your changes:
Fixes N/A
I made SearchIndexing always use distributed staged-index reindexing because the app should avoid live-index writes and no longer expose distributed/recreate mode choices. This removes the old single-server pipeline/classes and legacy config options, adds helper classes for entity type handling, stats mapping, config sanitization, and staged finalization, and updates generated schemas/docs/scripts. Testing:
mvn -pl openmetadata-service spotless:apply -DskipTests; focused backend suite with 150 tests; UI schema Jest test;git diff --check; local Docker deployment with latest SearchIndexingApplication run success, 36/36 records indexed, 0 failures, and no legacy flags in app-run config.Type of change:
Checklist:
Fixes <issue-number>: <short explanation>