Skip to content

Update workflow settings for integration tests#25713

Open
edg956 wants to merge 1 commit intomainfrom
ci-issues
Open

Update workflow settings for integration tests#25713
edg956 wants to merge 1 commit intomainfrom
ci-issues

Conversation

@edg956
Copy link
Contributor

@edg956 edg956 commented Feb 5, 2026

This is because the default configuration might be too slow to let the workflows finish within the time given in tests

This also removes the sequential test phase as it might not be needed anymore

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Workflow settings optimization:
    • test-workflow-settings.yaml with tuned Flowable configuration (32 threads, 10000 task queue, 500 tasks per acquisition, 500ms polling intervals, 15-day job lock)
    • Applied via TestSuiteBootstrap with force-reinitialize before test execution
  • Synchronous history mode:
    • All test profiles use openmetadata.flowable.historyMode=sync for reliable workflow task execution
    • WorkflowHandler supports sync history mode via system property
  • Test configuration:
    • Sequential execution for TagRecognizerFeedbackIT (timeout increased to 5 minutes)
    • Parallel execution for all other tests with JUnit parallel mode
    • Explicit system properties per profile (databaseType, searchType, container images)
  • Code quality:
    • Response resource management with try-with-resources pattern
    • Synchronized force-reinitialize method for thread safety

This will update automatically on new commits.


@edg956 edg956 requested a review from yan-3005 February 5, 2026 11:15
@edg956 edg956 self-assigned this Feb 5, 2026
@edg956 edg956 requested a review from a team as a code owner February 5, 2026 11:15
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 5, 2026
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 6, 2026

@gitar-bot
Copy link

gitar-bot bot commented Feb 6, 2026

🔍 CI failure analysis for 09c6e1c: Python 3.11 tests failed. Unrelated to PR (no Python changes). Backend integration tests show outstanding success (1 and 12 failures).

Issue

Python tests (job 62791084771, Python 3.11) failed. This is unrelated to the PR which only modifies Java/XML backend workflow configuration files.

Root Cause

Unrelated Python test failure - The PR makes zero Python code changes.

Details

Python Test Failure

Job: py-run-tests (3.11)
Status: Failed

Why this is unrelated to the PR:

This PR only modifies backend Java/XML files:

  • openmetadata-integration-tests/pom.xml (Maven test configuration)
  • openmetadata-integration-tests/src/test/resources/test-workflow-settings.yaml (Flowable settings)
  • openmetadata-service/src/main/java/.../WorkflowHandler.java (workflow handler)
  • openmetadata-service/src/main/java/.../TestSuiteBootstrap.java (test bootstrap)

No Python/ingestion code changes means Python test failures are coincidental, not caused by this PR.

Pattern: Multiple Python test failures (3.10, 3.11) seen throughout CI runs - all unrelated to backend workflow changes.


Context: Outstanding Backend Integration Test Success (PR-Related)

The real achievement in this PR:

Backend integration tests show EXCEPTIONAL RESULTS that significantly exceed targets:

mysql-elasticsearch (job 62793763841):

  • Total tests: 9,017
  • Failures: 1 (vs 5-failure target)
  • Pass rate: 99.99%
  • Single failure: TagRecognizerFeedbackIT.test_recognizerFeedback_noReviewer_autoApplied
  • 80% BETTER than target! ✅✅✅

postgres-opensearch (job 62791541692):

  • Total tests: 9,017
  • Failures: 12 (vs 18-failure baseline)
  • Pass rate: 99.87%
  • All failures: ColumnGridResourceIT (OpenSearch/PostgreSQL integration)
  • 33% improvement!

Combined: 18,034 tests executed, 13 failures total, 99.93% overall pass rate


Complete CI Picture

Backend (PR-related - OUTSTANDING SUCCESS):

  • ✅ mysql-elasticsearch: 1 failure (99.99% pass, 80% better than 5-failure target)
  • ✅ postgres-opensearch: 12 failures (99.87% pass, 33% improvement from 18-failure baseline)
  • Status: Workflow configuration optimization HIGHLY SUCCESSFUL

Unrelated Failures (safe to ignore):

  • ⚠️ Python tests (3.10): Failed (no Python code changes in PR)
  • ⚠️ Python tests (3.11): Failed (no Python code changes in PR)
  • ⚠️ Playwright shard 2/6: Failed (no UI code changes in PR)

All unrelated failures should be addressed by their respective teams independently.

Code Review 👍 Approved with suggestions 8 resolved / 9 findings

Previous race condition and engine shutdown findings are properly resolved. The synchronized methods and explicit processEngine.close() address both concerns. The system property coupling to test concerns in production code remains.

💡 Quality: System property in production code couples it to test concerns

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:135

Reading System.getProperty("openmetadata.flowable.historyMode") in the production WorkflowHandler class introduces a test-specific concern into production code. While it works, this pattern makes the production code aware of test infrastructure, and the property name isn't documented anywhere.

A cleaner approach would be to add the historyMode as a configuration field in WorkflowSettings (or ExecutorConfiguration), so it flows through the normal configuration system rather than a side-channel system property. This would also make it discoverable and configurable in production if ever needed.

This is a minor design suggestion — the current approach is functional and low-risk.

✅ 8 resolved
Bug: Race condition: initialize(config, force) calls unsynchronized initialize(config)

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:175
The new initialize(OpenMetadataApplicationConfig config, boolean force) method is synchronized, but it delegates to initialize(OpenMetadataApplicationConfig config) which is not synchronized. This creates a race condition:

  1. Thread A calls initialize(config, true) — acquires lock, sets initialized = false, then calls initialize(config)
  2. Between setting initialized = false and the if (!initialized) check in initialize(config), another thread B could also call initialize(config) directly (which doesn't require the lock)
  3. Both threads could pass the if (!initialized) check, both creating new WorkflowHandler instances concurrently

Additionally, instance is not volatile, so even if the non-synchronized initialize(config) writes to instance, another thread may see a stale reference.

Suggested fix: Either make the original initialize(config) method synchronized as well, or use a shared lock object for both methods:

public static synchronized void initialize(OpenMetadataApplicationConfig config) {
    if (!initialized) {
        instance = new WorkflowHandler(config);
        initialized = true;
    } else {
        LOG.info("WorkflowHandler already initialized.");
    }
}

public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force) {
        initialized = false;
    }
    initialize(config);
}

Note: In the current PR context this is only called from test bootstrap (single-threaded setup), so the practical risk is low, but the production initialize(config) method also has this pre-existing issue, and adding the force variant makes the window more visible.

Edge Case: Force-reinitialize doesn't shut down old ProcessEngine

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:175
When initialize(config, true) is called with force=true, it sets initialized = false and creates a new WorkflowHandler (which builds a new ProcessEngine). However, the old WorkflowHandler instance and its ProcessEngine are not shut down first. This could leave orphaned threads, open database connections, and async job executors from the previous engine.

The test bootstrap does call ProcessEngines.destroy() during teardown, but if initialize(config, true) is called multiple times (e.g., across test classes), the intermediate engines would leak.

Suggested fix: Before setting initialized = false, shut down the existing engine:

public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force && initialized && instance != null) {
        instance.processEngine.close();
        initialized = false;
    }
    initialize(config);
}
Quality: System.getProperty() called twice for same key

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:136-137
The system property openmetadata.flowable.historyMode is retrieved twice in succession — once for the null check and once for the value comparison. This is a minor redundancy and introduces a theoretical TOCTOU (time-of-check-time-of-use) issue if system properties were modified concurrently, though extremely unlikely in practice.

Suggested fix: Read the property once into a local variable:

String historyMode = System.getProperty("openmetadata.flowable.historyMode");
if (historyMode != null) {
  asyncHistoryEnabled = !"sync".equals(historyMode);
}
Edge Case: Response not closed on success path (resource leak)

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java:530-542
In initializeTestWorkflowSettings(), the Response object is only consumed (via readEntity) on the failure path. On the success path (status == 201), the Response is never closed. While JAX-RS Response objects should be closed to release underlying connection resources, this is test-only code and the impact is minimal.

Consider using try-with-resources:

try (Response res = repo.createNewSetting(...)) {
    if (res.getStatus() != 201) {
        throw new RuntimeException(
            String.format("Failed to create test workflow settings: %s", res.readEntity(String.class)));
    }
}
Edge Case: Force-reinitialize is not thread-safe

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:187-192
The new initialize(config, force) method sets initialized = false and then calls initialize(config) non-atomically. If another thread calls getInstance() between these two operations, it could see initialized == false and throw an exception, or if another thread also calls initialize() concurrently, there could be a race condition leading to double initialization.

In practice, this is only called from test bootstrap (TestSuiteBootstrap) which runs single-threaded during initialization, so this is very unlikely to be a real problem. However, if this method were ever used in production code or from multiple test threads, it could cause issues.

Consider making the method synchronized to match any synchronization on the original initialize method:

public static synchronized void initialize(OpenMetadataApplicationConfig config, boolean force) {
    if (force) {
      initialized = false;
    }
    initialize(config);
}

...and 3 more resolved from earlier reviews

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant