Reapply change - Fix Entity Promotion by mohityadav766 · Pull Request #25665 · open-metadata/OpenMetadata

mohityadav766 · 2026-02-02T11:20:26Z

Describe your changes:

Reapply changes for entity promotion

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

New tracking mechanism:
- EntityCompletionTracker class monitors partition completion per entity type using concurrent data structures
Per-entity index promotion:
- Modified DistributedSearchIndexExecutor and DistributedSearchIndexCoordinator to promote each entity's index independently when all partitions complete
Enhanced vector operation handling:
- Updated PartitionWorker to wait for vector embedding tasks with 120s timeout before reporting partition completion
Comprehensive test coverage:
- Added EntityCompletionTrackerTest with 10 test cases covering callbacks, failures, concurrency, and edge cases

_{This will update automatically on new commits.}

...a/org/openmetadata/service/apps/bundles/searchIndex/distributed/EntityCompletionTracker.java

mohityadav766 · 2026-02-02T17:14:42Z

@gitar-bot what test are failing in playwright, are they related?

sonarqubecloud · 2026-02-03T16:16:00Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gitar-bot · 2026-02-05T06:29:37Z

Code Review ⚠️ Changes requested 1 resolved / 4 findings

Per-entity promotion feature looks solid, but PartitionWorker has duplicate await calls and conditional flush logic bugs from apparent merge/refactor artifacts that need cleanup.

⚠️

Bug: Dead code - duplicate awaitSinkCompletion calls

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java

The method waitForSinkOperations contains two consecutive calls to statsTracker.awaitSinkCompletion():

Line 315: boolean statsComplete = statsTracker.awaitSinkCompletion(statsTimeout);
Line 319: boolean completed = statsTracker.awaitSinkCompletion(30000);

The second await call (lines 316-325) appears to be dead code that will never execute because:

statsComplete is checked at line 316 (if (!statsComplete))
Inside that block, there's another awaitSinkCompletion(30000) call at line 319
However, the first call with statsTimeout (30s or 60s) would already have timed out

This looks like a merge/refactor artifact where old code wasn't properly removed. The intent seems to be: wait once with dynamic timeout, then log a warning if it didn't complete.

Suggested fix:

boolean statsComplete = statsTracker.awaitSinkCompletion(statsTimeout);
if (!statsComplete) {
    LOG.warn(
        "Timed out waiting for sink stats completion, {} operations still pending for entity {}",
        statsTracker.getPendingSinkOps(),
        statsTracker.getEntityType());
}
statsTracker.flush();

⚠️

Bug: statsTracker.flush() only called when stats wait times out

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java

The statsTracker.flush() call at line 327 is currently inside the if (!statsComplete) block, meaning it will only be called when the stats completion wait times out.

If statsComplete is true (the wait succeeds), the flush is never called, which means stats from this partition may not be properly persisted.

Current structure:

if (!statsComplete) {
    boolean completed = statsTracker.awaitSinkCompletion(30000);
    if (!completed) {
        LOG.warn(...);
    }
    statsTracker.flush();  // Only called when !statsComplete
}

Suggested fix:
Move statsTracker.flush() outside the conditional block so it's always called:

if (!statsComplete) {
    LOG.warn(
        "Timed out waiting for sink stats completion, {} operations still pending for entity {}",
        statsTracker.getPendingSinkOps(),
        statsTracker.getEntityType());
}
statsTracker.flush();  // Always flush

💡 Quality: Promotion callback executes inline - may block partition completion

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java:247

In initializeEntityTracker, the entity completion callback directly invokes promoteEntityIndex:

entityTracker.setOnEntityComplete(
    (entityType, success) -> promoteEntityIndex(entityType, success));

The promoteEntityIndex method performs I/O operations (interacting with Elasticsearch/OpenSearch indices) and could be slow. Since this callback is invoked synchronously from recordPartitionComplete in the coordinator, it may block the thread that is processing partition completions.

Impact: If index promotion is slow, it could delay completion tracking for other partitions/entities being processed on the same thread.

Suggested fix:
Consider executing the promotion asynchronously:

entityTracker.setOnEntityComplete(
    (entityType, success) -> 
        CompletableFuture.runAsync(() -> promoteEntityIndex(entityType, success)));

However, this introduces complexity around error handling and tracking. The current synchronous approach is acceptable if promotion is typically fast, but worth monitoring in production.

✅ 1 resolved

✅ Edge Case: Potential NPE when accessing failedPartitions for uninitialized entity

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/EntityCompletionTracker.java:89
In recordPartitionComplete, while the code correctly checks that completed and total are not null for the entity, it accesses failedPartitions.get(entityType) without a null check before calling incrementAndGet():
if (partitionFailed) {
  failedPartitions.get(entityType).incrementAndGet();  // NPE if failedPartitions entry is null
}
Although initializeEntity always initializes all three maps together, if the maps get out of sync (e.g., due to a bug or future refactoring), this could cause a NullPointerException.

Suggested fix:
Add a defensive null check:
if (partitionFailed) {
  AtomicInteger failed = failedPartitions.get(entityType);
  if (failed != null) {
    failed.incrementAndGet();
  }
}

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

mohityadav766 added 2 commits February 2, 2026 15:53

use promote entity reindex in distributed

3a9475c

Add Logs and finalize remaining entities

b4454a8

mohityadav766 self-assigned this Feb 2, 2026

mohityadav766 requested a review from a team as a code owner February 2, 2026 11:20

mohityadav766 had a problem deploying to test February 2, 2026 11:20 — with GitHub Actions Error

github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Feb 2, 2026

gitar-bot bot reviewed Feb 2, 2026

View reviewed changes

...a/org/openmetadata/service/apps/bundles/searchIndex/distributed/EntityCompletionTracker.java Show resolved Hide resolved

fix nullpointer

8e83cb0

mohityadav766 had a problem deploying to test February 2, 2026 11:39 — with GitHub Actions Error

mohityadav766 temporarily deployed to test February 2, 2026 11:39 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test February 2, 2026 11:39 — with GitHub Actions Error

Fix Canonical Index Deletion

866f619

mohityadav766 had a problem deploying to test February 2, 2026 12:23 — with GitHub Actions Failure

mohityadav766 temporarily deployed to test February 2, 2026 12:23 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test February 2, 2026 12:23 — with GitHub Actions Failure

Fix Test

68f0195

mohityadav766 had a problem deploying to test February 2, 2026 17:16 — with GitHub Actions Error

Merge branch 'main' into fix-rein

5b9605e

mohityadav766 had a problem deploying to test February 3, 2026 13:56 — with GitHub Actions Failure

mohityadav766 temporarily deployed to test February 3, 2026 13:56 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test February 3, 2026 13:56 — with GitHub Actions Failure

Merge branch 'main' into fix-rein

fdc5099

mohityadav766 had a problem deploying to test February 4, 2026 04:49 — with GitHub Actions Failure

Merge branch 'main' into fix-rein

2fb7a42

mohityadav766 had a problem deploying to test February 4, 2026 07:33 — with GitHub Actions Failure

Merge branch 'main' into fix-rein

b252faa

mohityadav766 had a problem deploying to test February 5, 2026 06:24 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply change - Fix Entity Promotion #25665

Reapply change - Fix Entity Promotion #25665
mohityadav766 wants to merge 10 commits intomainfrom
fix-rein

mohityadav766 commented Feb 2, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

Uh oh!

mohityadav766 commented Feb 2, 2026

Uh oh!

sonarqubecloud bot commented Feb 3, 2026

Uh oh!

gitar-bot bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohityadav766 commented Feb 2, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

Uh oh!

mohityadav766 commented Feb 2, 2026

Uh oh!

sonarqubecloud bot commented Feb 3, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

gitar-bot bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohityadav766 commented Feb 2, 2026 •

edited by gitar-bot bot

Loading

gitar-bot bot commented Feb 5, 2026 •

edited

Loading