HYPERFLEET-907 - docs: add condition mapping design#141
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive design document for a CEL-based condition mapping engine that integrates with the existing Sequence Diagram(s)sequenceDiagram
participant Client
participant APIService
participant AggregateResourceStatus
participant CEL
participant ResourceStatusStore
Client->>APIService: PUT /statuses (resource status payload)
APIService->>AggregateResourceStatus: invoke aggregation pipeline
AggregateResourceStatus->>CEL: evaluate mapping rules for adapter conditions
CEL-->>AggregateResourceStatus: mapped conditions (1:1 or merged N:1)
AggregateResourceStatus->>ResourceStatusStore: persist aggregated resource conditions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hyperfleet/components/api-service/condition-mapping-design.md`:
- Around line 36-40: Add an "Implementation Coordination" (or "Cross-Repository
Impact") subsection to this document that explicitly lists affected repositories
(Sentinel, Adapter framework), calls out coordination points such as Sentinel's
condition() helper and client filters and Adapter CEL task configs that read
status.conditions (including the effect of the data field exclusion and new
mapped conditions), and prescribes validation steps (review CEL expressions for
assumptions about condition cardinality/fields, run existing Sentinel and
Adapter unit/integration tests, and update configs/tests where mapped conditions
change shape). Ensure the subsection references the specific concerns around the
data field exclusion and mapped-condition aggregation so implementers know what
to validate and whom to coordinate with.
- Line 7: The doc lacks the explicit Context/Decision/Consequences structure
required by the hard-delete-design.md precedent: update
condition-mapping-design.md by either (a) adding top-level headers "## Context",
"## Decision", "## Consequences" and move the existing "What & Why" content into
Context, "How" into Decision, and "Trade-offs" into Consequences, or (b) add a
clear mapping note at the top explaining these equivalences; also add a
"Transaction Guarantees" subsection under the Decision/How section that
centrally documents the transaction/atomicity and ordering rationale referenced
around lines 135 and 218, and explicitly state ownership and
transaction/ordering responsibility within the Decision section so reviewers can
find it easily.
- Line 267: Update the "Allowlist Approach" sentence to say "six standard
condition fields" instead of "five" and add the missing field name
`observed_generation` to the enumerated list so it reads: type, status, reason,
message, observed_generation, last_transition_time (reflecting ADR-0007 and the
Terminology/CEL context references).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 5459bc77-8f4d-41a9-a859-d9c6ec581b0d
📒 Files selected for processing (1)
hyperfleet/components/api-service/condition-mapping-design.md
| Last Updated: 2026-05-20 | ||
| --- | ||
|
|
||
| # Condition Mapping Design |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consider restructuring to match the hard-delete-design.md precedent.
The document provides comprehensive information but doesn't follow the Context/Decision/Consequences structure established by the referenced hard-delete-design.md precedent. While "What & Why" maps to Context, "How" to Decision, and "Trade-offs" to Consequences, the headers don't make this explicit.
As per coding guidelines: "Use this document as the 'design doc shape' precedent referenced by the PR: clearly state Context/Decision/Consequences and explicitly document the ownership and transaction/ordering rationale."
Consider either:
- Adding explicit "## Context", "## Decision", "## Consequences" top-level headers with current content mapped appropriately
- Or adding a note at the top explaining how current sections map to the standard structure
The transaction/atomicity rationale (lines 135, 218) is mentioned but should be centrally documented in a "Transaction Guarantees" subsection under Decision/How.
Also applies to: 23-23, 360-360
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hyperfleet/components/api-service/condition-mapping-design.md` at line 7, The
doc lacks the explicit Context/Decision/Consequences structure required by the
hard-delete-design.md precedent: update condition-mapping-design.md by either
(a) adding top-level headers "## Context", "## Decision", "## Consequences" and
move the existing "What & Why" content into Context, "How" into Decision, and
"Trade-offs" into Consequences, or (b) add a clear mapping note at the top
explaining these equivalences; also add a "Transaction Guarantees" subsection
under the Decision/How section that centrally documents the
transaction/atomicity and ordering rationale referenced around lines 135 and
218, and explicitly state ownership and transaction/ordering responsibility
within the Decision section so reviewers can find it easily.
| **Related Documentation:** | ||
| - **Current Status Aggregation**: [ADR-0008 — Dynamic Status Aggregation](../../adrs/0008-dynamic-status-aggregation.md) — aggregation computed on write path; [ADR-0007 — Conditions-Based Status Model](../../adrs/0007-conditions-based-status-model.md) — ResourceCondition and AdapterCondition contracts; [Status Guide](../../docs/status-guide.md) — condition reporting and validation rules | ||
| - [API Service Design](./api-service.md) — API architecture and service layer patterns | ||
| - [Sentinel Message Decision Config](../sentinel/sentinel.md) — Existing CEL usage in Sentinel | ||
| - [Adapter Framework Design](../adapter/framework/adapter-frame-design.md) — Existing CEL usage in adapters (Config Loader and Criteria Evaluator sections) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Consider adding cross-repo coordination guidance.
The linked repositories research shows that Sentinel and Adapter repos consume resource.status.conditions via CEL and have existing configs/tests that depend on condition shape and field availability. Specifically:
- Sentinel's
condition()helper and client filters rely on current condition structure - Adapter CEL configs in task definitions read
status.conditions - Both repos may be affected by the
datafield exclusion (line 204) and new mapped conditions appearing in aggregation results
While the References section lists related documentation, consider adding an explicit "Implementation Coordination" or "Cross-Repository Impact" subsection that:
- Lists affected repositories (Sentinel, Adapter framework)
- Identifies specific coordination points (e.g., validating Sentinel decision configs still work with new mapped conditions)
- Recommends validation steps (e.g., reviewing existing CEL expressions for assumptions about condition cardinality)
This helps implementers anticipate integration work and aligns with the hard-delete-design.md precedent of explicitly documenting implementation dependencies.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hyperfleet/components/api-service/condition-mapping-design.md` around lines
36 - 40, Add an "Implementation Coordination" (or "Cross-Repository Impact")
subsection to this document that explicitly lists affected repositories
(Sentinel, Adapter framework), calls out coordination points such as Sentinel's
condition() helper and client filters and Adapter CEL task configs that read
status.conditions (including the effect of the data field exclusion and new
mapped conditions), and prescribes validation steps (review CEL expressions for
assumptions about condition cardinality/fields, run existing Sentinel and
Adapter unit/integration tests, and update configs/tests where mapped conditions
change shape). Ensure the subsection references the specific concerns around the
data field exclusion and mapped-condition aggregation so implementers know what
to validate and whom to coordinate with.
| source_condition_filter: | ||
| expression: | | ||
| adapterCondition.type.startsWith("GCP") | ||
| aggregation_logic: |
There was a problem hiding this comment.
The N-to-1 merge aggregation_logic specifies status_expression, reason_expression, and message_expression, but doesn't define how observed_generation and last_transition_time are determined for the merged output.
Example 2's output shows "last_transition_time": "2026-05-19T10:32:00Z" which doesn't match any source condition (10:25, 10:30, 10:20) — it's unclear whether this is the current time, the latest source time, or a bug in the example.
Consider either:
- Adding
observed_generation_expressionandlast_transition_time_expressionto theaggregation_logicschema for full control - Or documenting the default behavior explicitly (e.g., "
observed_generationdefaults toresourceGeneration;last_transition_timedefaults to the latestlast_transition_timeamong source conditions")
Either way, the Example 2 output should be consistent with the documented behavior.
| source_condition_filter: | ||
| expression: | | ||
| adapterCondition.type in ["ControlPlaneReady", "WorkerNodesReady"] | ||
| transformations: |
There was a problem hiding this comment.
Two implicit behaviors in the 1-to-1 transformation schema that should be documented explicitly:
-
Field copy default: The
transformationsarray only specifiestypeandmessage, but Example 1 showsstatus,reason,observed_generation, andlast_transition_timecopied as-is from the source. Please state explicitly that untransformed fields are copied from the source adapter condition. -
Per-match cardinality: This rule's filter matches two condition types (
ControlPlaneReadyandWorkerNodesReady). Does the rule produce one output condition per match (i.e., 2 resource conditions)? TheadapterConditionsingular variable name suggests per-match execution, but this should be stated explicitly — especially since N-to-1 merge exists as a separate pattern.
Suggested addition under Configuration Schema:
"For 1-to-1 rules, the transformation runs once per matching adapter condition, producing one resource condition per match. Fields not listed in
transformationsare copied as-is from the source adapter condition."
|
|
||
| | Term | Definition | | ||
| |------|-----------| | ||
| | **Resource Condition** | Kubernetes-style condition in `status.conditions` array on Cluster/NodePool resources. Status is `True` or `False` only. | |
There was a problem hiding this comment.
The Resource Condition definition states "Status is True or False only", but ADR-0007 (referenced by this document) specifies that the initial state after creation is Available=Unknown, Ready=False. ADR-0008 also mentions Available=Unknown for first adapter reports.
The Unknown filtering rationale for mapped conditions is sound, but this terminology entry overgeneralizes — existing aggregated conditions (Available) can be Unknown transiently.
Consider qualifying to:
"Status is
TrueorFalsein steady state.Unknownmay appear transiently (e.g.,Available=Unknownat resource creation per ADR-0007). Mapped conditions enforce theTrue/False-only contract — see Unknown Filtering."
This prevents implementers from mistaking existing Unknown states as bugs.
| 6. Marshal all conditions to JSON | ||
| 7. Update resource.status_conditions | ||
|
|
||
| Mapped conditions are included in the same database transaction as the adapter status update via the existing transaction-per-request middleware (see `hyperfleet-api/pkg/db/CLAUDE.md`). |
There was a problem hiding this comment.
nit: This references hyperfleet-api/pkg/db/CLAUDE.md as documentation for the transaction middleware. CLAUDE.md files are developer tooling instructions, not architecture documentation — they change frequently and aren't reviewed with the same rigor.
Consider referencing the actual middleware source file instead, or describing the behavior inline (e.g., "via the existing transaction-per-request middleware in pkg/db").
| |------|-----------| | ||
| | **Resource Condition** | Kubernetes-style condition in `status.conditions` array on Cluster/NodePool resources. Status is `True` or `False` only. | | ||
| | **Adapter Condition** | Condition reported by adapters via `PUT /statuses`. Status can be `True`, `False`, or `Unknown`. Stored in `adapter_statuses` table. **Note**: Adapter conditions with `status="Unknown"` are automatically filtered out during mapping and never converted to resource conditions, preventing violations of the True/False-only contract. | | ||
| | **Standard Condition Fields** | All conditions (both Resource and Adapter) contain six fields: `type` (string, condition category), `status` (`True`/`False` for resource conditions; `True`/`False`/`Unknown` for adapter conditions), `reason` (CamelCase string, machine-readable cause), `message` (human-readable description), `observed_generation` (resource generation when condition was set), `last_transition_time` (RFC 3339 timestamp of last status change). | |
There was a problem hiding this comment.
nit: can we make this a list for better readability?
|
|
||
| 1. **Sentinel CEL expressions** must fetch `/statuses` to evaluate adapter conditions, adding latency and complexity | ||
| 2. **Adapters checking preconditions** need internal API access instead of reading public resource state | ||
| 3. **External consumers** (CLI, UI, integrations) cannot access provider-specific status without hitting internal endpoints |
There was a problem hiding this comment.
IMO this is the only problem we are trying to achieve with the change.
Let me share my logic.
- The data that goes into the public endpoint will be seen by customers.
- That data can be VERY different from the data required by Sentinel/Adapters to do their work
- This means that custom condition mappings should not be used to make things easier/efficient for Sentinel/Adapter
- They should still hit
/statusesif they need internal information
The way I like to think about it is:
Does the customer care about that information in the Resource?
If no, then it shouldn't be there
And remember that for customers, anything related with adapters is a black box, they should know nothing about them.
| - CEL-based condition mapping engine in API status aggregation flow | ||
| - Configuration schema for mapping rules (per-adapter filters, transformations, merge logic) | ||
| - Integration into existing `AggregateResourceStatus()` function | ||
| - Backward compatibility with existing conditions |
There was a problem hiding this comment.
I would say that we don't need backward compatibility here
|
|
||
| ### Out of Scope | ||
|
|
||
| - **Replacing existing conditions** — `Ready`, `Reconciled`, `Available`, `<Adapter>Successful` remain unchanged |
There was a problem hiding this comment.
Please take into account that there should be no Ready nor Available in resource.status.conditions anymore.
These have been replaced by Reconciled and LastKnownReconciled
There was a problem hiding this comment.
About keeping <Adapter>Successful conditions.... IMO we should completely remove them.
Rationale:
- We created them because there was no option of custom mappings
- As they are right know, they expose information about adapters.... which is meaningless for the customers
- Moreover, if a partner needs them, they can just configure now a custom mapping
- Code will be simpler without exceptions.
- Think how to explaining the aggregated conditions... they are the two general ones, the custom ones and then one per.... bla bla....
There was a problem hiding this comment.
In fact... if we were able to express Reconciled and LastKnownReconciled via custom condition mappings...
- There would be no exceptions
- All aggregated conditions would be custom
- Resources that are not "reconciliable" will not add the two synthetic aggregated conditions
- We have now an example with
ChannelsandVersions- AFAIK they don't need reconciliation
| **Key Points**: | ||
| - Mapping happens **on every PUT /statuses** (during existing aggregation flow) | ||
| - Unknown conditions automatically filtered out (only True/False mapped) | ||
| - Execution time: ~2-3ms for 15 rules across 3 adapters |
There was a problem hiding this comment.
Where does this come from?
|
|
||
| **RFC 9457 Compliance**: Mapping errors returned to adapters (via `PUT /statuses` response) follow RFC 9457 Problem Details format (see [Error Model Standard](../../standards/error-model.md)). Error messages MUST NOT leak internal details (stack traces, database connection strings, internal service names). | ||
|
|
||
| **Audit Logging**: Mapping execution errors (CEL timeouts, field validation failures, cardinality violations) are logged with adapter name, rule name, and error reason. Logs are retained for **90 days** per the [Logging Standard](../../standards/logging-specification.md). |
There was a problem hiding this comment.
I was surprised by our standards mandating retention for 90 days... I went to read it.... and could't find it 🧩
But, this should be something out of scope here and for partner teams to decide
|
|
||
| ### Configuration Schema | ||
|
|
||
| Mapping rules are configured per-adapter in the API's adapter requirements config. Two patterns are supported: **1-to-1 transformation** and **N-to-1 merge**. |
There was a problem hiding this comment.
I wonder why do we differentiate between 1-to-1 and N-to-1
What if we consider all rules to be N-to-1?
We could expose all statuses as a variable for CEL expressions, always accesible
There was a problem hiding this comment.
On second read of the document, AFAIU the N-to-1 are still referring to conditions of the same adapter.
If this is correct, my question is:
Why limiting a mapped condition to a single adapter?
- What if a partner wants to expose a condition that comes from aggregating different adapters?
- For customers, the adapters should be invisible
- The mapping engine can have all statuses available, not just the one being currently updated
|
|
||
| **Evaluation Order**: Rules are evaluated **in declaration order** (top-to-bottom). Rules targeting the same adapter are processed sequentially. | ||
|
|
||
| **Conflict Resolution**: When multiple rules produce resource conditions with the **same `type`**, the **last-wins** strategy applies — the condition from the last evaluated rule overwrites earlier conditions. In a deployment with 3 adapters and 5 rules per adapter (15 total), processing order adds ~2-3ms latency per resource due to sequential evaluation. |
There was a problem hiding this comment.
What if rules are a map instead of a list?
- That way, the key of the map becomes the
target - There is no possibility of overlap
- We loose the evaluation order
- Is order required?
|
|
||
| **Unknown Filtering**: Adapter conditions with `status="Unknown"` are **automatically filtered out** before CEL evaluation. Only conditions with `status="True"` or `status="False"` are passed to the mapping engine, ensuring resource conditions never violate the True/False-only contract. | ||
|
|
||
| ### CEL Evaluation Context |
There was a problem hiding this comment.
I wonder what other variables do we have accessible in the CEL expression, for example.
- The full resource
- e.g. accessing
spec.generation
- e.g. accessing
- The adapter configuration
- Environment variables
| ```yaml | ||
| - name: "rosa-control-plane" | ||
| adapter: "rosa-adapter" | ||
| source_condition_filter: |
There was a problem hiding this comment.
When source_condition_filter is false I guess the target property does not appear in the result conditions?
There was a problem hiding this comment.
I see this behavior to be similar to the when keyword in the adapter status task.
IMO if it does the same, it would be nice for all our DSLs to use the same keywords
Summary
Technical specification for CEL-based condition mapping mechanism to expose adapter conditions in API resource
status.conditions. Deliverable forHYPERFLEET-907 spike story, optimized for spec-driven development in next sprint.
What Changed
New Document
condition-mapping-design.md— Full technical specificationhard-delete-design.mdpattern: references architecture docs instead of duplicating implementation detailsKey Design Decisions
Configuration Schema
status_expression,reason_expression,message_expressionSecurity & Performance
Field Naming
observed_generation,last_transition_timedatafield)Validation Behavior
reasonfield: Condition skipped if oversized (machine-readable, used in CEL logic)messagefield: Truncated to 2048 chars (human-readable, informational only)Acceptance Criteria (HYPERFLEET-907)
before implementation sprint (Slack message sent to request input)
PUT /statuses, 2-3ms overhead)Testing
Related Links