Update workflow settings for integration tests by edg956 · Pull Request #25713 · open-metadata/OpenMetadata

edg956 · 2026-02-05T11:15:32Z

This is because the default configuration might be too slow to let the workflows finish within the time given in tests

This also removes the sequential test phase as it might not be needed anymore

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

Workflow settings optimization:
- test-workflow-settings.yaml with tuned Flowable configuration (32 threads, 10000 task queue, 500 tasks per acquisition, 500ms polling intervals, 15-day job lock)
- Applied via TestSuiteBootstrap with force-reinitialize before test execution
Synchronous history mode:
- All test profiles use openmetadata.flowable.historyMode=sync for reliable workflow task execution
- WorkflowHandler supports sync history mode via system property
Test configuration:
- Sequential execution for TagRecognizerFeedbackIT (timeout increased to 5 minutes)
- Parallel execution for all other tests with JUnit parallel mode
- Explicit system properties per profile (databaseType, searchType, container images)
Code quality:
- Response resource management with try-with-resources pattern
- Synchronized force-reinitialize method for thread safety

_{This will update automatically on new commits.}

...tadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java

sonarqubecloud · 2026-02-06T12:16:56Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

...ata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java

gitar-bot · 2026-02-06T19:27:13Z

🔍 CI failure analysis for 09c6e1c: Python 3.11 tests failed. Unrelated to PR (no Python changes). Backend integration tests show outstanding success (1 and 12 failures).

Issue

Python tests (job 62791084771, Python 3.11) failed. This is unrelated to the PR which only modifies Java/XML backend workflow configuration files.

Root Cause

Unrelated Python test failure - The PR makes zero Python code changes.

Details

Python Test Failure

Job: py-run-tests (3.11)
Status: Failed

Why this is unrelated to the PR:

This PR only modifies backend Java/XML files:

openmetadata-integration-tests/pom.xml (Maven test configuration)
openmetadata-integration-tests/src/test/resources/test-workflow-settings.yaml (Flowable settings)
openmetadata-service/src/main/java/.../WorkflowHandler.java (workflow handler)
openmetadata-service/src/main/java/.../TestSuiteBootstrap.java (test bootstrap)

No Python/ingestion code changes means Python test failures are coincidental, not caused by this PR.

Pattern: Multiple Python test failures (3.10, 3.11) seen throughout CI runs - all unrelated to backend workflow changes.

Context: Outstanding Backend Integration Test Success (PR-Related)

The real achievement in this PR:

Backend integration tests show EXCEPTIONAL RESULTS that significantly exceed targets:

mysql-elasticsearch (job 62793763841):

Total tests: 9,017
Failures: 1 (vs 5-failure target)
Pass rate: 99.99%
Single failure: TagRecognizerFeedbackIT.test_recognizerFeedback_noReviewer_autoApplied
80% BETTER than target! ✅✅✅

postgres-opensearch (job 62791541692):

Total tests: 9,017
Failures: 12 (vs 18-failure baseline)
Pass rate: 99.87%
All failures: ColumnGridResourceIT (OpenSearch/PostgreSQL integration)
33% improvement! ✅

Combined: 18,034 tests executed, 13 failures total, 99.93% overall pass rate

Complete CI Picture

Backend (PR-related - OUTSTANDING SUCCESS):

✅ mysql-elasticsearch: 1 failure (99.99% pass, 80% better than 5-failure target)
✅ postgres-opensearch: 12 failures (99.87% pass, 33% improvement from 18-failure baseline)
Status: Workflow configuration optimization HIGHLY SUCCESSFUL

Unrelated Failures (safe to ignore):

⚠️ Python tests (3.10): Failed (no Python code changes in PR)
⚠️ Python tests (3.11): Failed (no Python code changes in PR)
⚠️ Playwright shard 2/6: Failed (no UI code changes in PR)

All unrelated failures should be addressed by their respective teams independently.

Code Review 👍 Approved with suggestions 8 resolved / 9 findings

Previous race condition and engine shutdown findings are properly resolved. The synchronized methods and explicit processEngine.close() address both concerns. The system property coupling to test concerns in production code remains.

💡 Quality: System property in production code couples it to test concerns

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:135

Reading System.getProperty("openmetadata.flowable.historyMode") in the production WorkflowHandler class introduces a test-specific concern into production code. While it works, this pattern makes the production code aware of test infrastructure, and the property name isn't documented anywhere.

A cleaner approach would be to add the historyMode as a configuration field in WorkflowSettings (or ExecutorConfiguration), so it flows through the normal configuration system rather than a side-channel system property. This would also make it discoverable and configurable in production if ever needed.

This is a minor design suggestion — the current approach is functional and low-risk.

✅ 8 resolved

✅ Bug: Race condition: initialize(config, force) calls unsynchronized initialize(config)

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:175
The new initialize(OpenMetadataApplicationConfig config, boolean force) method is synchronized, but it delegates to initialize(OpenMetadataApplicationConfig config) which is not synchronized. This creates a race condition:

Thread A calls initialize(config, true) — acquires lock, sets initialized = false, then calls initialize(config)

Between setting initialized = false and the if (!initialized) check in initialize(config), another thread B could also call initialize(config) directly (which doesn't require the lock)

Both threads could pass the if (!initialized) check, both creating new WorkflowHandler instances concurrently

Additionally, instance is not volatile, so even if the non-synchronized initialize(config) writes to instance, another thread may see a stale reference.

Suggested fix: Either make the original initialize(config) method synchronized as well, or use a shared lock object for both methods:
public static synchronized void initialize(OpenMetadataApplicationConfig config) {
    if (!initialized) {
        instance = new WorkflowHandler(config);
        initialized = true;
    } else {
        LOG.info("WorkflowHandler already initialized.");
    }
}

public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force) {
        initialized = false;
    }
    initialize(config);
}
Note: In the current PR context this is only called from test bootstrap (single-threaded setup), so the practical risk is low, but the production initialize(config) method also has this pre-existing issue, and adding the force variant makes the window more visible.

✅ Edge Case: Force-reinitialize doesn't shut down old ProcessEngine

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:175
When initialize(config, true) is called with force=true, it sets initialized = false and creates a new WorkflowHandler (which builds a new ProcessEngine). However, the old WorkflowHandler instance and its ProcessEngine are not shut down first. This could leave orphaned threads, open database connections, and async job executors from the previous engine.

The test bootstrap does call ProcessEngines.destroy() during teardown, but if initialize(config, true) is called multiple times (e.g., across test classes), the intermediate engines would leak.

Suggested fix: Before setting initialized = false, shut down the existing engine:
public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force && initialized && instance != null) {
        instance.processEngine.close();
        initialized = false;
    }
    initialize(config);
}

✅ Quality: System.getProperty() called twice for same key

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:136-137
The system property openmetadata.flowable.historyMode is retrieved twice in succession — once for the null check and once for the value comparison. This is a minor redundancy and introduces a theoretical TOCTOU (time-of-check-time-of-use) issue if system properties were modified concurrently, though extremely unlikely in practice.

Suggested fix: Read the property once into a local variable:
String historyMode = System.getProperty("openmetadata.flowable.historyMode");
if (historyMode != null) {
  asyncHistoryEnabled = !"sync".equals(historyMode);
}

✅ Edge Case: Response not closed on success path (resource leak)

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java:530-542
In initializeTestWorkflowSettings(), the Response object is only consumed (via readEntity) on the failure path. On the success path (status == 201), the Response is never closed. While JAX-RS Response objects should be closed to release underlying connection resources, this is test-only code and the impact is minimal.

Consider using try-with-resources:
try (Response res = repo.createNewSetting(...)) {
    if (res.getStatus() != 201) {
        throw new RuntimeException(
            String.format("Failed to create test workflow settings: %s", res.readEntity(String.class)));
    }
}

✅ Edge Case: Force-reinitialize is not thread-safe

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:187-192
The new initialize(config, force) method sets initialized = false and then calls initialize(config) non-atomically. If another thread calls getInstance() between these two operations, it could see initialized == false and throw an exception, or if another thread also calls initialize() concurrently, there could be a race condition leading to double initialization.

In practice, this is only called from test bootstrap (TestSuiteBootstrap) which runs single-threaded during initialization, so this is very unlikely to be a real problem. However, if this method were ever used in production code or from multiple test threads, it could cause issues.

Consider making the method synchronized to match any synchronization on the original initialize method:
public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force) {
      initialized = false;
    }
    initialize(config);
}

...and 3 more resolved from earlier reviews

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

edg956 requested a review from yan-3005 February 5, 2026 11:15

edg956 self-assigned this Feb 5, 2026

edg956 requested a review from a team as a code owner February 5, 2026 11:15

edg956 had a problem deploying to test February 5, 2026 11:15 — with GitHub Actions Failure

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 5, 2026

edg956 had a problem deploying to test February 5, 2026 11:44 — with GitHub Actions Failure

edg956 temporarily deployed to test February 5, 2026 11:44 — with GitHub Actions Inactive

edg956 force-pushed the ci-issues branch from 45b995a to db31f17 Compare February 5, 2026 16:35

edg956 temporarily deployed to test February 5, 2026 16:35 — with GitHub Actions Inactive

edg956 had a problem deploying to test February 5, 2026 16:35 — with GitHub Actions Failure

edg956 temporarily deployed to test February 5, 2026 16:35 — with GitHub Actions Inactive

gitar-bot bot reviewed Feb 5, 2026

View reviewed changes

...tadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java Outdated Show resolved Hide resolved

edg956 force-pushed the ci-issues branch from db31f17 to f8632c0 Compare February 5, 2026 19:13

edg956 temporarily deployed to test February 5, 2026 19:13 — with GitHub Actions Inactive

edg956 temporarily deployed to test February 6, 2026 09:59 — with GitHub Actions Inactive

edg956 had a problem deploying to test February 6, 2026 09:59 — with GitHub Actions Failure

edg956 temporarily deployed to test February 6, 2026 12:20 — with GitHub Actions Inactive

edg956 had a problem deploying to test February 6, 2026 12:20 — with GitHub Actions Failure

edg956 temporarily deployed to test February 6, 2026 12:20 — with GitHub Actions Inactive

edg956 force-pushed the ci-issues branch from 08e4c70 to 704aa0d Compare February 6, 2026 18:53

edg956 had a problem deploying to test February 6, 2026 18:53 — with GitHub Actions Error

gitar-bot bot reviewed Feb 6, 2026

View reviewed changes

...ata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java Show resolved Hide resolved

Play with flowable async parameters and reintroduce system properties

09c6e1c

edg956 force-pushed the ci-issues branch from 704aa0d to 09c6e1c Compare February 6, 2026 19:20

edg956 temporarily deployed to test February 6, 2026 19:20 — with GitHub Actions Inactive

edg956 had a problem deploying to test February 6, 2026 19:20 — with GitHub Actions Failure

edg956 temporarily deployed to test February 6, 2026 19:20 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update workflow settings for integration tests#25713

Update workflow settings for integration tests#25713
edg956 wants to merge 1 commit intomainfrom
ci-issues

edg956 commented Feb 5, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

Uh oh!

sonarqubecloud bot commented Feb 6, 2026

Uh oh!

Uh oh!

gitar-bot bot commented Feb 6, 2026 •

edited

Loading

Issue

Root Cause

Details

Python Test Failure

Context: Outstanding Backend Integration Test Success (PR-Related)

Complete CI Picture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edg956 commented Feb 5, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

Uh oh!

sonarqubecloud bot commented Feb 6, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

gitar-bot bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

Details

Python Test Failure

Context: Outstanding Backend Integration Test Success (PR-Related)

Complete CI Picture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edg956 commented Feb 5, 2026 •

edited by gitar-bot bot

Loading

gitar-bot bot commented Feb 6, 2026 •

edited

Loading