[WIP] feat(workflows): batch entity processing — entityList-only for automated task nodes#26715
[WIP] feat(workflows): batch entity processing — entityList-only for automated task nodes#26715
Conversation
…ted task nodes
Phase 1 of batch entity processing in governance workflows. All
automated task nodes (checkEntityAttributesTask, setEntityAttributeTask,
checkChangeDescriptionTask, rollbackEntityTask, sinkTask,
dataCompletenessTask) now process a List<String> of entity links via
entityList exclusively. The relatedEntity fallback in getEntityList()
is removed — batch nodes no longer have that path.
Key changes:
- PeriodicBatchEntityTrigger (singleExecutionMode=false): each child
process now receives global_entityList via ${entityToListMap[relatedEntity]}.
FetchEntitiesImpl pre-builds entityToListMap (entity -> List.of(entity))
so the JUEL expression resolves without static class references.
- Batch node impls (6 files): removed relatedEntity fallback from
getEntityList() and the now-unused RELATED_ENTITY_VARIABLE import.
entityList is the only input path.
- BPMN builders: putIfAbsent(ENTITY_LIST_VARIABLE, GLOBAL_NAMESPACE) for
all batch-capable nodes; relatedEntity is never added to inputNamespaceMap
by builders.
- v1130 migration (addEntityListToNamespaceMap): updated to also strip
relatedEntity from batch node inputNamespaceMaps, covering both fresh
upgrades (add entityList + remove relatedEntity) and instances that
already ran the previous migration (entityList present, remove relatedEntity).
Migration remains idempotent.
- GlossaryApprovalWorkflow.json: removed relatedEntity from all 13 batch
node inputNamespaceMaps. userApprovalTask nodes keep relatedEntity.
- JSON schemas: entityList added to trigger output and batch node
inputNamespaceMap/input definitions.
- Integration tests: updated to reflect entityList-first structure across
all batch-capable workflow node configurations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| for (String entityLinkStr : entityList) { | ||
| MessageParser.EntityLink entityLink = MessageParser.EntityLink.parse(entityLinkStr); | ||
| rollbackEntity(execution, entityLink, updatedBy, workflowInstanceExecutionId); | ||
| } |
There was a problem hiding this comment.
⚠️ Bug: RollbackEntity overwrites execution variables in batch loop
When entityList contains multiple entities, rollbackEntity() is called in a loop and each iteration sets the same execution variables (rollbackAction, rollbackFromVersion, rollbackToVersion, rollbackEntityId, rollbackEntityType) via execution.setVariable(). Only the last entity's rollback metadata survives. Other batch implementations (DataCompletenessImpl, SinkTaskDelegate) correctly accumulate per-entity results into maps/lists.
Suggested fix:
Accumulate rollback results per entity into a Map<String, Object> (keyed by entity link), similar to how DataCompletenessImpl builds entityResults. Set a single aggregated variable after the loop completes.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| "type": "array", | ||
| "items": { "type": "string" }, | ||
| "default": ["relatedEntity"], | ||
| "default": ["relatedEntity", "entityList"], |
There was a problem hiding this comment.
⚠️ Bug: dataCompletenessTask.json schema still requires relatedEntity
The JSON schema for dataCompletenessTask still has "required": ["relatedEntity"] in inputNamespaceMap and includes relatedEntity in the default input array. However, DataCompletenessImpl.java no longer reads relatedEntity — it exclusively uses ENTITY_LIST_VARIABLE. This mismatch means:
- Schema validation requires a field the code ignores
additionalProperties: falsecombined with the required relatedEntity prevents creating workflows without it- Inconsistent with peer schemas (rollbackEntityTask, setEntityAttributeTask, checkChangeDescriptionTask) which were updated to require
entityListinstead
Suggested fix:
Update the schema to match peers:
- Change `"required": ["relatedEntity"]` to `"required": ["entityList"]`
- Change default input to `["entityList"]`
- Remove the relatedEntity property from inputNamespaceMap
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| private List<String> getEntityList( | ||
| Map<String, String> inputNamespaceMap, WorkflowVariableHandler varHandler) { | ||
| String entityListNamespace = inputNamespaceMap.get(ENTITY_LIST_VARIABLE); | ||
| if (entityListNamespace != null) { | ||
| Object entityListObj = | ||
| varHandler.getNamespacedVariable(entityListNamespace, ENTITY_LIST_VARIABLE); | ||
| if (entityListObj instanceof List) { | ||
| return (List<String>) entityListObj; | ||
| } | ||
| } | ||
| return List.of(); | ||
| } |
There was a problem hiding this comment.
💡 Quality: Duplicated getEntityList() method across 6 classes
The exact same getEntityList(inputNamespaceMap, varHandler) method is copy-pasted into CheckChangeDescriptionTaskImpl, CheckEntityAttributesImpl, DataCompletenessImpl, RollbackEntityImpl, SetEntityAttributeImpl, and SinkTaskDelegate. This is a maintenance hazard — any bug fix or behavior change needs to be applied in 6 places.
Suggested fix:
Extract getEntityList() into a shared utility method in WorkflowVariableHandler or a new helper class.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
Code Review
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
There was a problem hiding this comment.
Pull request overview
Phase 1 of batch-entity processing for governance workflows: moves automated task nodes to consume entities exclusively via entityList (a list of entity-link strings), updates trigger/BPMN wiring to pass lists, and adds a v1130 migration to update stored workflow definitions accordingly.
Changes:
- Extend trigger outputs and workflow event variable initialization to include
entityList. - Update automated task node implementations and BPMN builders to read from
entityListand to populateentityListininputNamespaceMapby default. - Add v1130 data migration to inject
entityListinto workflow definitions and removerelatedEntityfrom batch-node namespace maps; update built-in workflow JSON and integration tests.
Reviewed changes
Copilot reviewed 30 out of 37 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/triggers/periodicBatchEntityTrigger.json | Updates trigger output schema to include entityList. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/triggers/eventBasedEntityTrigger.json | Updates trigger output schema to include entityList. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/sinkTask.json | Switches sink task schema inputs to entityList. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/setEntityAttributeTask.json | Switches set-attribute task schema inputs to entityList. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/rollbackEntityTask.json | Switches rollback task schema inputs to entityList. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/dataCompletenessTask.json | Adds entityList to schema for completeness task. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/checkEntityAttributesTask.json | Switches check-attributes task schema input to entityList and adds output list defaults. |
| openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/checkChangeDescriptionTask.json | Switches check-change-description task schema input to entityList and adds output list defaults. |
| openmetadata-service/src/main/resources/json/data/governance/workflows/GlossaryApprovalWorkflow.json | Updates built-in workflow to wire batch-capable nodes via entityList. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1130/MigrationUtil.java | Adds v1130 migration to add entityList to triggers and batch node namespace maps; strips relatedEntity for batch nodes; redeploys workflows. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v1130/Migration.java | Executes the new workflow migration during v1130 Postgres upgrade. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v1130/Migration.java | Executes the new workflow migration during v1130 MySQL upgrade. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/triggers/impl/FetchEntitiesImpl.java | Builds entityToListMap so periodic trigger can pass per-entity entityList without static calls in JUEL. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/triggers/PeriodicBatchEntityTrigger.java | Updates periodic trigger call-activity input mapping to always pass entityList (full batch or per-entity list) and adjusts loop cardinality handling. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/triggers/EventBasedEntityTrigger.java | Ensures event-based trigger passes entityList into the called workflow. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/sink/SinkTaskDelegate.java | Refactors sink delegate to always read entities from entityList. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/SetEntityAttributeImpl.java | Changes set-attribute delegate to iterate over entityList. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/RollbackEntityImpl.java | Changes rollback delegate to iterate over entityList and adds BPMN error handling + exception propagation. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/DataCompletenessImpl.java | Changes completeness delegate to process entityList and emit per-band lists + results. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/CheckEntityAttributesImpl.java | Changes attribute-check delegate to partition entities via entityList. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/CheckChangeDescriptionTaskImpl.java | Changes change-description-check delegate to partition entities via entityList. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/SetEntityAttributeTask.java | Ensures BPMN builder injects entityList namespace mapping by default. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/RollbackEntityTask.java | Ensures BPMN builder injects entityList namespace mapping by default. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/DataCompletenessTask.java | Ensures BPMN builder injects entityList namespace mapping by default. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/CheckEntityAttributesTask.java | Ensures BPMN builder injects entityList namespace mapping by default. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/CheckChangeDescriptionTask.java | Ensures BPMN builder injects entityList namespace mapping by default. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/TriggerFactory.java | Forces periodic triggers to run with singleExecutionMode=false. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowEventConsumer.java | Populates both relatedEntity and entityList in event-triggered workflow variables. |
| openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/Workflow.java | Adds a constant for a “false entity list” variable name. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java | Updates integration tests to use entityList-based configs and expected outputs. |
Comments suppressed due to low confidence (1)
openmetadata-spec/src/main/resources/json/schema/governance/workflows/elements/nodes/automatedTask/dataCompletenessTask.json:110
- This schema still requires
relatedEntityininputNamespaceMap(and includes it in the defaultinput), but the Java implementation now reads only fromentityListand the v1130 migration stripsrelatedEntityfrom batch node inputNamespaceMaps. As written, migrated workflows will fail validation fordataCompletenessTask. Update the schema to makeentityListthe required input/namespace and remove or deprecaterelatedEntityhere to match runtime behavior.
"input": {
"type": "array",
"items": { "type": "string" },
"default": ["relatedEntity", "entityList"],
"additionalItems": false,
"minItems": 1
},
"inputNamespaceMap": {
"type": "object",
"properties": {
"relatedEntity": {
"type": "string",
"default": "global"
},
"entityList": {
"type": "string",
"default": "global"
}
},
"additionalProperties": false,
"required": ["relatedEntity"]
},
| case PERIODIC_BATCH_ENTITY -> new PeriodicBatchEntityTrigger( | ||
| workflow.getName(), | ||
| triggerWorkflowId, | ||
| (PeriodicBatchEntityTriggerDefinition) workflow.getTrigger(), | ||
| hasBatchModeNodes(workflow)); | ||
| false); | ||
| }; |
There was a problem hiding this comment.
PeriodicBatchEntityTrigger is now always constructed with singleExecutionMode=false. This removes the previous behavior that detected batch-mode sink nodes and forced loopCardinality("1") to avoid parallel executions (notably to prevent race conditions in sinks that commit externally). If batch-mode sinks still need single-execution semantics, reintroduce the detection or make single-execution an explicit trigger/workflow config rather than hardcoding false.
| public static final String INGESTION_PIPELINE_ID_VARIABLE = "ingestionPipelineId"; | ||
| public static final String RELATED_ENTITY_VARIABLE = "relatedEntity"; | ||
| public static final String ENTITY_LIST_VARIABLE = "entityList"; | ||
| public static final String FALSE_ENTITY_LIST_VARIABLE = "false_entityList"; |
There was a problem hiding this comment.
FALSE_ENTITY_LIST_VARIABLE is introduced but not used anywhere in the codebase, and its snake_case form (false_entityList) doesn’t match the falseEntityList name used by the updated schemas/tests. Either remove this constant or rename/use it consistently across implementations and schema definitions.
| public static final String FALSE_ENTITY_LIST_VARIABLE = "false_entityList"; |
| boolean result = !trueEntityList.isEmpty(); | ||
| varHandler.setNodeVariable("true_" + ENTITY_LIST_VARIABLE, trueEntityList); | ||
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); |
There was a problem hiding this comment.
The node writes the false-branch list under the variable name false_entityList (via "false_" + ENTITY_LIST_VARIABLE), but the updated schema/tests expect falseEntityList. This mismatch will break downstream consumers that rely on the declared output variables. Align the runtime variable name with the schema (and consider keeping the old key as an alias if backward-compat is needed).
| boolean result = !trueEntityList.isEmpty(); | |
| varHandler.setNodeVariable("true_" + ENTITY_LIST_VARIABLE, trueEntityList); | |
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); | |
| boolean result = !trueEntityList.isEmpty(); | |
| // Backward-compatible underscored variables (existing behavior) | |
| varHandler.setNodeVariable("true_" + ENTITY_LIST_VARIABLE, trueEntityList); | |
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); | |
| // New camelCase variables aligned with workflow schema/tests | |
| varHandler.setNodeVariable("trueEntityList", trueEntityList); | |
| varHandler.setNodeVariable("falseEntityList", falseEntityList); |
|
|
||
| boolean result = !trueEntityList.isEmpty(); | ||
| varHandler.setNodeVariable("true_" + ENTITY_LIST_VARIABLE, trueEntityList); | ||
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); |
There was a problem hiding this comment.
This task stores the false-branch entity list under false_entityList (via "false_" + ENTITY_LIST_VARIABLE), but the schema/integration tests reference falseEntityList. That means the advertised output variable will never be populated. Rename the output variable to match the schema (and optionally keep the old name as an additional output for compatibility).
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); | |
| varHandler.setNodeVariable("false_" + ENTITY_LIST_VARIABLE, falseEntityList); | |
| varHandler.setNodeVariable("falseEntityList", falseEntityList); |
| "additionalItems": false, | ||
| "minItems": 1, | ||
| "maxItems": 1, | ||
| "maxItems": 2, |
There was a problem hiding this comment.
output.maxItems is set to 2, but workflows commonly include additional trigger outputs like updatedBy (e.g., GlossaryApprovalWorkflow now has 3 outputs: entityList, relatedEntity, updatedBy). With maxItems: 2 those workflows will fail schema validation. Consider removing the maxItems constraint (or increasing it / using a less restrictive validation approach).
| "maxItems": 2, |
| "additionalItems": false, | ||
| "minItems": 1, | ||
| "maxItems": 1, | ||
| "maxItems": 2, |
There was a problem hiding this comment.
output.maxItems is 2, but periodic batch triggers often need to output additional variables (e.g., updatedBy) alongside entityList/relatedEntity. The integration tests build triggers with 3 outputs (relatedEntity, entityList, updatedBy), which would violate this schema. Relax or remove the maxItems restriction so custom trigger outputs remain valid.
| "maxItems": 2, |
| "inputNamespaceMap": { | ||
| "type": "object", | ||
| "properties": { | ||
| "entityList": { | ||
| "type": "string" | ||
| }, | ||
| "relatedEntity": { | ||
| "type": "string", | ||
| "default": "global" | ||
| "type": "string" | ||
| } | ||
| }, | ||
| "additionalProperties": false, | ||
| "required": ["relatedEntity"] | ||
| "additionalProperties": false | ||
| }, |
There was a problem hiding this comment.
inputNamespaceMap for this batch-capable node no longer requires entityList (and still exposes relatedEntity). Since the implementation reads from entityList, allowing configs without entityList will validate but result in a runtime no-op. Make entityList required (with a default namespace like global) and drop relatedEntity from the schema if it's no longer supported.
|
🟡 Playwright Results — all passed (20 flaky)✅ 3110 passed · ❌ 0 failed · 🟡 20 flaky · ⏭️ 207 skipped
🟡 20 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



Phase 1 of batch entity processing in governance workflows. All automated task nodes (checkEntityAttributesTask, setEntityAttributeTask, checkChangeDescriptionTask, rollbackEntityTask, sinkTask, dataCompletenessTask) now process a List of entity links via entityList exclusively. The relatedEntity fallback in getEntityList() is removed — batch nodes no longer have that path.
Key changes:
PeriodicBatchEntityTrigger (singleExecutionMode=false): each child process now receives global_entityList via ${entityToListMap[relatedEntity]}. FetchEntitiesImpl pre-builds entityToListMap (entity -> List.of(entity)) so the JUEL expression resolves without static class references.
Batch node impls (6 files): removed relatedEntity fallback from getEntityList() and the now-unused RELATED_ENTITY_VARIABLE import. entityList is the only input path.
BPMN builders: putIfAbsent(ENTITY_LIST_VARIABLE, GLOBAL_NAMESPACE) for all batch-capable nodes; relatedEntity is never added to inputNamespaceMap by builders.
v1130 migration (addEntityListToNamespaceMap): updated to also strip relatedEntity from batch node inputNamespaceMaps, covering both fresh upgrades (add entityList + remove relatedEntity) and instances that already ran the previous migration (entityList present, remove relatedEntity). Migration remains idempotent.
GlossaryApprovalWorkflow.json: removed relatedEntity from all 13 batch node inputNamespaceMaps. userApprovalTask nodes keep relatedEntity.
JSON schemas: entityList added to trigger output and batch node inputNamespaceMap/input definitions.
Integration tests: updated to reflect entityList-first structure across all batch-capable workflow node configurations.
Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>