docs(serviceprovider): add quality standards first draft#105
docs(serviceprovider): add quality standards first draft#105maximiliantech wants to merge 1 commit into
Conversation
maximiliantech
left a comment
There was a problem hiding this comment.
Just finalised a first draft for the service provider quality standards. I believe this list will change over time (possibly just get longer). Please take this first draft as a proposal. There might be requirements that are just not right from the beginning or criteria that I forgot in here. I am happy to see your feedback @christophrj 🫶
| - End-to-end tests run on every release against a real cluster, using [openmcp-testing](https://github.com/openmcp-project/openmcp-testing). | ||
| - Documentation includes a troubleshooting section. | ||
|
|
||
| ## The ten quality criteria |
There was a problem hiding this comment.
For the reviewer: please treat this list as a proposal!
|
|
||
| ### 7. Security hardening | ||
|
|
||
| The controller's container runs with `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`, and drops all Linux capabilities. RBAC is split between cluster-scope (only what is truly needed) and namespace-scope. No wildcards on critical verbs (`*` on `secrets`, etc.). |
There was a problem hiding this comment.
Not sure if we want to add it to our quality standards right from the beginning. The topic is way more deep and I am a bit unsure wether this is just a scratch on the surface which is not really helping a service provider developer or platform owner.
There was a problem hiding this comment.
We could reference the build repo here regarding a recommended base image https://github.com/openmcp-project/build/blob/main/Dockerfile in case a developer chooses to not use the sp-template setup.
Regarding the deployment settings, those are managed by the openmcp-operator so I assume there is no way to ignore those settings, right?
| The standard exists for two audiences: | ||
|
|
||
| 1. **Service provider developers** read it as a checklist. It tells you what you need to implement and which tier you can claim. | ||
| 2. **Platform owners** read it to evaluate whether a service provider is mature enough to install in their landscape. |
There was a problem hiding this comment.
I do believe the Service Provider developer is the main driver behind these quality standards. The platform owner is more or less the stakeholder for these requirements. Both have are interested in this. I am not sure if developers/serviceproviders/ is the right path to put this in. What do you think @christophrj?
There was a problem hiding this comment.
Imo it is fine to put this in developers/... and then reference it somewhere in operators/...
|
|
||
| A `MAINTAINERS.md` or `CODEOWNERS` file names responsible humans or teams. The repo declares its support level (best-effort, business-hours, etc.). | ||
|
|
||
| ## Tier matrix |
There was a problem hiding this comment.
This tier matrix is definitely subject to change, but I thought it would make it more transparent what to expect from a service provider.
christophrj
left a comment
There was a problem hiding this comment.
@maximiliantech I think it is a nice start. I added some comments and would really appreciate additional feedback from other parties like core/ops.
|
|
||
| Before uninstalling, the service provider checks whether any custom resources belonging to the domain service still exist on the managed `ControlPlane`. If they do, the provider blocks deletion and surfaces a clear message explaining what needs to be cleaned up first. | ||
|
|
||
| ### 2. Status phases and conditions |
There was a problem hiding this comment.
I think it makes sense to add a link to the status reporting section of our controller guidelines here https://open-control-plane.io/developers/general#status-reporting
| - Documentation includes a troubleshooting section. | ||
| - Deletion is well-defined: before uninstalling a domain service, the provider checks for existing custom resources belonging to it and blocks removal if any are found. | ||
|
|
||
| ## The quality criteria |
There was a problem hiding this comment.
Please add a section regarding required annotations where we point to our controller guidelines https://open-control-plane.io/developers/general#operation-annotations
|
|
||
| Every `ServiceProviderAPI` reconciled by the service provider carries a `Status.Phase` (`Ready` / `Progressing` / `Terminating`) and typed `Conditions` (at minimum `Ready`, plus type-specific ones). Conditions follow the `metav1.Condition` shape with `reason` and `message`. | ||
|
|
||
| ### 3. Clear error messages |
There was a problem hiding this comment.
I would merge this with section 2 to have one error/status reporting section
|
|
||
| When reconciliation fails, the cause appears on a Condition's `message` and as a Kubernetes Event on the `ServiceProviderAPI`. Messages are actionable — they name the missing or invalid input, or the upstream system that failed. | ||
|
|
||
| ### 4. API stability policy |
There was a problem hiding this comment.
In general it probably makes sense to reference https://github.com/kubernetes/community/blob/main/contributors/devel/sig-architecture/api_changes.md and
https://github.com/kubernetes/community/blob/main/contributors/devel/sig-architecture/api-conventions.md at some point in this document
| The standard exists for two audiences: | ||
|
|
||
| 1. **Service provider developers** read it as a checklist. It tells you what you need to implement and which tier you can claim. | ||
| 2. **Platform owners** read it to evaluate whether a service provider is mature enough to install in their landscape. |
There was a problem hiding this comment.
Imo it is fine to put this in developers/... and then reference it somewhere in operators/...
|
|
||
| ### 7. Security hardening | ||
|
|
||
| The controller's container runs with `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`, and drops all Linux capabilities. RBAC is split between cluster-scope (only what is truly needed) and namespace-scope. No wildcards on critical verbs (`*` on `secrets`, etc.). |
There was a problem hiding this comment.
We could reference the build repo here regarding a recommended base image https://github.com/openmcp-project/build/blob/main/Dockerfile in case a developer chooses to not use the sp-template setup.
Regarding the deployment settings, those are managed by the openmcp-operator so I assume there is no way to ignore those settings, right?
|
|
||
| **Community** requires unit tests covering the reconciler. **Stable** additionally requires end-to-end tests run on every release against a real cluster. Tests must cover the full lifecycle: install, reconcile, and delete. The [openmcp-testing](https://github.com/openmcp-project/openmcp-testing) framework provides tooling for this within the OpenControlPlane ecosystem; equivalent frameworks are acceptable. | ||
|
|
||
| ### 9. Maintenance signals |
There was a problem hiding this comment.
Imo signals is a little misleading in this context. Maybe Ownership and Maintenance Documentation?
|
|
||
| The controller's container runs with `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`, and drops all Linux capabilities. RBAC is split between cluster-scope (only what is truly needed) and namespace-scope. No wildcards on critical verbs (`*` on `secrets`, etc.). | ||
|
|
||
| ### 8. Testing |
There was a problem hiding this comment.
Imo if we define a testing section then it is our responsibility to provide an integration test that a service provider has to pass, rather than dictate any unit testing goals that might not meet any quality standard that we aim for. For now openmcp-testing is a good starting point and we can think about extending this with an integration test suite to check that a service provider is "well-behaved" in the future.
|
|
||
| ## Graduation between tiers | ||
|
|
||
| Tier graduations are reviewed by **SIG Extensibility maintainers**. The transparency of the compliance table is the accountability mechanism — anyone can audit the claim against the criteria — but the SIG is the gatekeeper that signs off on a change of tier. |
There was a problem hiding this comment.
I think this only works if we maintain this inside a community/registry/marketplace repo.
|
|
||
| The standard is **tooling-agnostic**. It describes what a service provider must *produce* (its artifacts) and how it must *behave* (at runtime), not how it is built. Service providers in any GitHub organisation can adhere to it, regardless of the build system or CI they use. | ||
|
|
||
| ## Tiers at a glance |
There was a problem hiding this comment.
Community (governance/ownership) feels like a different dimension vs Stable and Experimental (technical maturity).
ownership dimension could be Official (provided by the core team) and Community and maybe something in between like Verified which aligns with your suggestion of core maintainers reviewing external community repos.
When I look at the overview table that is provided at the end of this document, maybe rename the tiers to Experimental, Stabilizing and Stable/Production-ready?
|
|
||
| The provider declares its API maturity per CRD (`v1alpha1`, `v1beta1`, `v1`) and follows the matching change policy. Breaking changes at `v1beta1`+ require a deprecation cycle. Conversion webhooks exist when multiple versions are served. | ||
|
|
||
| ### 5. Air-gapped / custom CA bundle support |
There was a problem hiding this comment.
Imo these are features and not quality standards
Signed-off-by: Maximilian Techritz <maximilian.techritz@sap.com>
89b9acc to
4873407
Compare
What this PR does / why we need it:
First draft of the OpenControlPlane service provider quality standards at
docs/developers/serviceprovider/07-quality-standards.mdx.Defines three tiers (Experimental / Community / Stable), ten criteria, a per-repo compliance table, and SIG Extensibility-led tier graduation.
Which issue(s) this PR fixes:
Related #49
Special notes for your reviewer:
This document should be the place for the quality standard. In the document itself there is an example markdown snippet that each service provider should include in its README as well. Ultimately, I would like to turn this into a conformance standard in an automated fashion. Every service provider needs to do tests and then gets a badge for each criteria or something like that. Similar to the Kubernetes conformance matrix from Gardener.
Release note: