Skip to content

Make SearchIndexing distributed-only#27971

Open
harshach wants to merge 6 commits intomainfrom
harshach/searchindex-distributed-default
Open

Make SearchIndexing distributed-only#27971
harshach wants to merge 6 commits intomainfrom
harshach/searchindex-distributed-default

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented May 7, 2026

Describe your changes:

Fixes N/A

I made SearchIndexing always use distributed staged-index reindexing because the app should avoid live-index writes and no longer expose distributed/recreate mode choices. This removes the old single-server pipeline/classes and legacy config options, adds helper classes for entity type handling, stats mapping, config sanitization, and staged finalization, and updates generated schemas/docs/scripts. Testing: mvn -pl openmetadata-service spotless:apply -DskipTests; focused backend suite with 150 tests; UI schema Jest test; git diff --check; local Docker deployment with latest SearchIndexingApplication run success, 36/36 records indexed, 0 failures, and no legacy flags in app-run config.

Type of change:

  • Improvement

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: runtime sanitization removes legacy SearchIndexing config keys from persisted app-run records, so a migration script is not needed.
  • I have added tests around the new logic.

Copilot AI review requested due to automatic review settings May 7, 2026 16:22
@harshach harshach requested a review from a team as a code owner May 7, 2026 16:22
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the SearchIndexing application distributed-only and enforces staged index writes with alias promotion, removing legacy single-server classes and the recreateIndex / useDistributedIndexing configuration toggles across backend, UI schemas, docs, and scripts.

Changes:

  • Remove legacy mode flags (recreateIndex, useDistributedIndexing) from schemas/config handling and sanitize persisted app/run configs.
  • Delete single-server indexing pipeline/strategy code and related tests; route all reindexing through distributed staged-index flow.
  • Add helper utilities for entity-type normalization, distributed stats mapping, and staged finalization/promotion.

Reviewed changes

Copilot reviewed 60 out of 70 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/utils/ApplicationSchemas/SearchIndexingApplication.json Removes legacy mode toggles from UI schema and updates language field description.
openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/workflow.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/applicationPipeline.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/application.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/entity/services/ingestionPipelines/ingestionPipeline.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/marketplace/createAppMarketPlaceDefinitionReq.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/marketplace/appMarketPlaceDefinition.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/configuration/internal/searchIndexingAppConfig.ts Regenerates SearchIndexing app config type to remove legacy options and update docs.
openmetadata-ui/src/main/resources/ui/src/generated/entity/applications/app.ts Regenerates types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/src/generated/api/services/ingestionPipelines/createIngestionPipeline.ts Regenerates API types to drop legacy indexing config field.
openmetadata-ui/src/main/resources/ui/public/locales/en-US/Applications/SearchIndexingApplication.md Removes legacy option docs and updates wording to staged promotion.
openmetadata-spec/src/main/resources/json/schema/entity/applications/configuration/internal/searchIndexingAppConfig.json Removes legacy mode options from the SearchIndexing app JSON schema.
openmetadata-service/src/test/java/org/openmetadata/service/cache/EntityCacheBypassTest.java Updates test docstring to reflect removal of single-server executor path.
openmetadata-service/src/test/java/org/openmetadata/service/apps/logging/AppRunLogAppenderTest.java Updates logger name expectation after executor/orchestrator refactor.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SingleServerIndexingStrategyTest.java Deletes tests for removed single-server strategy.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexStatsTest.java Deletes tests tied to removed single-server stats/executor implementation.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexFailureScenarioTest.java Deletes tests tied to removed single-server failure scenarios.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexEndToEndTest.java Deletes end-to-end test targeting removed executor flow.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestratorTest.java Updates orchestrator tests for distributed-only behavior + adds legacy-config sanitization assertion.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzOrchestratorContextTest.java Updates tests for new createReindexingContext() signature.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzJobContextTest.java Updates Quartz job context tests after removing distributed flag.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/SlackProgressListenerTest.java Updates Slack listener config details to staged promotion wording.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/QuartzProgressListenerTest.java Updates listener test config builder after removing legacy flags.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/LoggingProgressListenerTest.java Updates logging listener to report staged promotion rather than recreate/distributed toggles.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/IndexingPipelineTest.java Deletes tests for removed single-server pipeline.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReaderRetryTest.java Deletes tests for removed single-server reader.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReaderLifecycleTest.java Deletes tests for removed single-server reader lifecycle behavior.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/EntityBatchSizeEstimatorTest.java Deletes tests for removed batch size estimator.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategyTest.java Updates distributed strategy tests for staged-index context and new executor signature.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorkerTest.java Updates tests for staged index context being mandatory and always writing staged.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutorTest.java Updates executor tests for required staged index context and promotion handler renames.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobParticipantTest.java Updates participant tests to include staged index mapping on jobs.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobContextTest.java Updates context tests after removing isDistributed().
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/CompositeProgressListenerTest.java Updates test context stub after removing isDistributed().
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/AdaptiveBackoffTest.java Deletes tests for removed adaptive backoff.
openmetadata-service/src/main/resources/json/data/appMarketPlaceDefinition/SearchIndexingApplication.json Updates marketplace definition wording and removes legacy default config flag.
openmetadata-service/src/main/resources/json/data/app/SearchIndexingApplication.json Removes legacy default config flag from installed app config.
openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/ReindexingUtil.java Updates import to share time-series entity set via new helper.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SingleServerIndexingStrategy.java Deletes removed single-server strategy implementation.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexEntityTypes.java Adds centralized entity-type constants and normalization helpers.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexAppConfigSanitizer.java Adds runtime config sanitization to strip removed legacy keys.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java Sanitizes app config on init/validation and unifies distributed job status handling.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingProgressListener.java Updates callback documentation to staged-index terminology.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestrator.java Refactors orchestration to always run distributed strategy and sanitize configs.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingJobContext.java Removes isDistributed() from job context API.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingConfiguration.java Removes legacy mode flags from runtime configuration model and builders.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzOrchestratorContext.java Updates context factory signature for distributed-only mode.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/QuartzJobContext.java Removes distributed flag from Quartz job context.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OrphanedIndexCleaner.java Updates comments to reflect staged reindexing behavior.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OrchestratorContext.java Updates orchestrator context interface for new job context factory signature.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/SlackProgressListener.java Reports staged-promotion mode instead of recreate/distributed toggles.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/listeners/LoggingProgressListener.java Reports staged-promotion mode and removes distributed-mode logging.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/IndexingStrategy.java Deletes obsolete strategy abstraction.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReader.java Deletes removed single-server reader.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityBatchSizeEstimator.java Deletes removed batch sizing helper.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedReindexStatsMapper.java Extracts distributed job→Stats mapping logic into a dedicated helper.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedReindexFinalizer.java Extracts staged index finalization/promotion logic into a dedicated helper.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategy.java Enforces staged index preparation and simplified distributed executor invocation.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java Requires staged index context and always writes to staged target indexes.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java Switches time-series detection to shared helper and removes duplicated constants.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java Makes staged index context mandatory and standardizes promotion handler naming.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexCoordinator.java Updates precompute logic to use shared time-series classification helper.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobParticipant.java Requires staged index mapping for participation and reconstructs staged context from it.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedJobContext.java Removes isDistributed() from distributed job context.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DISTRIBUTED_INDEXING.md Updates documentation to reflect distributed-only staged promotion and removes legacy flags.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/AdaptiveBackoff.java Deletes removed backoff utility.
bin/distributed-test/scripts/trigger-reindex.sh Removes legacy flags and updates request payload to distributed-only mode.
Comments suppressed due to low confidence (1)

openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java:252

  • SearchIndexEntityTypes.isTimeSeriesEntity(entityType) normalizes legacy queryCostResult to queryCostRecord, but getTimeSeriesEntityCount() still uses the unnormalized entityType for Entity.getEntityTimeSeriesRepository(entityType) and reindexConfig.getTimeSeriesStartTs(entityType). If a legacy config/job includes queryCostResult, this path will throw EntityNotFoundException and silently return 0 due to the outer catch.

Normalize entityType once (e.g., String normalized = SearchIndexEntityTypes.normalizeEntityType(entityType)) and use the normalized value consistently for repo lookups, time-window lookups, and hashing/logging.

  public long getEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
    try {
      long count;
      if (SearchIndexEntityTypes.isTimeSeriesEntity(entityType)) {
        count = getTimeSeriesEntityCount(entityType, reindexConfig);
      } else {
        count = getRegularEntityCount(entityType);
      }

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

Copilot AI review requested due to automatic review settings May 7, 2026 16:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 61 out of 71 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java:275

  • SearchIndexEntityTypes.isTimeSeriesEntity() normalizes legacy names (e.g. queryCostResultqueryCostRecord), but PartitionCalculator.getEntityCount() continues using the original entityType for repository lookups. This can lead to a type being classified as time-series yet still throwing EntityNotFoundException in Entity.getEntityTimeSeriesRepository(entityType) (caught and returning 0), silently skipping partitions/counts. Normalize entityType once at the start of getEntityCount()/getTimeSeriesEntityCount() (and use the normalized value consistently for filters/repository lookup).
  public long getEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
    try {
      long count;
      if (SearchIndexEntityTypes.isTimeSeriesEntity(entityType)) {
        count = getTimeSeriesEntityCount(entityType, reindexConfig);
      } else {
        count = getRegularEntityCount(entityType);
      }
      LOG.debug("Entity count for {}: {}", entityType, count);
      return count;
    } catch (Exception e) {
      LOG.error("Failed to get entity count for type: {} - returning 0", entityType, e);
      return 0;
    }
  }

  private long getRegularEntityCount(String entityType) {
    EntityRepository<?> repository = Entity.getEntityRepository(entityType);
    return repository.getDao().listCount(new ListFilter(Include.ALL));
  }

  private long getTimeSeriesEntityCount(String entityType, ReindexingConfiguration reindexConfig) {
    ListFilter listFilter = new ListFilter(Include.ALL);
    EntityTimeSeriesRepository<?> repository;

    if (SearchIndexEntityTypes.isDataInsightEntity(entityType)) {
      listFilter.addQueryParam("entityFQNHash", FullyQualifiedName.buildHash(entityType));
      repository = Entity.getEntityTimeSeriesRepository(Entity.ENTITY_REPORT_DATA);
    } else {
      repository = Entity.getEntityTimeSeriesRepository(entityType);
    }

Copilot AI review requested due to automatic review settings May 7, 2026 16:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 62 out of 72 changed files in this pull request and generated 2 comments.

Comment on lines +111 to +116
if (entityStats == null || entityStats.isEmpty()) {
return false;
}
SearchIndexJob.EntityTypeStats stats = entityStats.get(entityType);
if (stats == null) {
return false;
Comment on lines +320 to +327
private Map<String, SearchIndexJob.EntityTypeStats> getFinalEntityStats() {
if (distributedExecutor == null) {
return Collections.emptyMap();
}
SearchIndexJob finalJob = distributedExecutor.getJobWithFreshStats();
return finalJob != null && finalJob.getEntityStats() != null
? finalJob.getEntityStats()
: Collections.emptyMap();
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 62%
62.45% (63029/100920) 42.82% (34014/79433) 45.79% (10053/21953)

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 7, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Refactors SearchIndexing to a distributed-only model by removing legacy single-server pipelines and cleaning up entity mapping logic. All prior findings regarding success calculations and magic strings have been resolved.

✅ 2 resolved
Edge Case: computeEntitySuccess returns true when stats are missing for entity

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedReindexFinalizer.java:114-117
In DistributedReindexFinalizer.computeEntitySuccess, when entityStats has entries but none for the requested entityType, the method returns true (line 116). This entity reached the finalizer because it was NOT already promoted by the per-entity completion tracker — meaning its partitions did not all complete successfully. Returning true here causes finalizeEntityReindex to promote the staged index with success=true, potentially swapping in an incomplete index that is missing data.

The intent appears to be "if there are no stats, we can't determine failure, so assume success" but the safer default for index promotion is to assume failure and leave the old index in place.

Quality: Magic string "all" instead of SearchIndexEntityTypes.ALL

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingConfiguration.java:184
The isSmartReindexing() method in ReindexingConfiguration uses the hardcoded string literal "all" (line 184) instead of the newly introduced SearchIndexEntityTypes.ALL constant. The whole purpose of this PR was to centralize entity type handling into SearchIndexEntityTypes, yet this reference was missed.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🔴 Playwright Results — 2 failure(s), 9 flaky

✅ 4001 passed · ❌ 2 failed · 🟡 9 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
🔴 Shard 2 748 2 2 8
🟡 Shard 3 759 0 2 7
🟡 Shard 4 773 0 2 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 735 0 3 8

Genuine Failures (failed on all attempts)

Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1778182111653"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1778182111653"]').locator('.status-badge')�[22m
�[2m    19 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'01c6d31a.Silly37d87cd9".StatusBadgeTerm1778182111653-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="DraftTerm1778182186779"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="DraftTerm1778182186779"]').locator('.status-badge')�[22m
�[2m    19 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'4b1c0002.Bravecfc17683".DraftTerm1778182186779-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

🟡 9 flaky test(s) (passed on retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Contract Status badge should be visible on condition if Contract Tab is present/hidden by Persona (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Not_Set (shard 4, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for dashboardDataModel -> searchIndex (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants