Add support for key manager access quote policy by martintomazic · Pull Request #6474 · oasisprotocol/oasis-core

martintomazic · 2026-03-16T20:24:52Z

Replaces #6471 closes #6387.

netlify · 2026-03-16T20:24:58Z

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Name	Link
🔨 Latest commit	`f38577a`
🔍 Latest deploy log	https://app.netlify.com/projects/oasisprotocol-oasis-core/deploys/6a05c3f232fafc0008b00074

martintomazic · 2026-03-17T14:49:52Z

+      - .buildkite/scripts/test_e2e.sh --timeout 20m
+        --scenario e2e/runtime/runtime-encryption
+        --scenario e2e/runtime/compute-policy
+


e2e test with mocked TEE was instrumental to realizing that without compute_policy on the rust side of the protocol, RHP would fail due to the missing field.

To test that compute policy is applied e2e you can disable pcs in the quote policy (fixture) and run:

export OASIS_UNSAFE_ALLOW_DEBUG_ENCLAVES=1 export OASIS_UNSAFE_MOCK_TEE=1 export OASIS_TEE_HARDWARE=intel-sgx make .buildkite/scripts/test_e2e.sh --scenario e2e/runtime/compute-policy

The test would gets stuck (as expected).

Full SGX test would be desired here (easy). It would be also nice to add another one, where we e.g. set MinTCBEvaluationDataNumber and verify via the control status that the registration failed... Still not optimal as technically we would catch node local attestation to halt early and not consensus rejecting byzantine compute.

Testing that rofl nodes pass on the lower TCB but compute node is rejected would be even harder as we would need two different SGX environments to make it work.

What do you think?

Yes, it would be great if we can test it in real SGX.

For the two different environments, let's open a separate issue to test that E2E.

Yes, it would be great if we can test it in real SGX.

Done 9675e14 :).

I had to comment UNSAFE_SKIP_AVR_VERIFY which caused most of the policy verification to be ignored (see) and set a more relaxed verification because our runner has outdated TCB.

My suggestion is to add another env variable: OASIS_UNSAFE_SKIP_LOCAL_VERIFY, which skips local verification prior to submitting attestation via transaction. Then I can verify from the node control status consensus rejected attestation for the right reason. The test is currently stopped at the local verification which is not ideal given this is critical code path that we want to have tested.

{"caller":"service.go:122","err":"quote verification failed (fresh bundle): pcs/quote: failed to verify TCB bundle: pcs/tcb: failed to verify TCB info: pcs/tcb: invalid TCB info: pcs/tcb: FMSPC is not whitelisted","level":"warn","module":"common/sgx/pcs/cqs","msg":"error verifying downloaded TCB refresh","tcb_evaluation_data_number":21,"ts":"2026-03-30T14:04:50.6645918Z"}

Optional:

I have happy and unhappy path. If we whitelist InvalidFMSCP under the default policy and the one the current runner is using under compute policy, this will also verify FMSCP whitelist e2e.

We can also play with MinTCBEvaluationDataNumber, possibly downgrade it to make current runner pass without LAX_VERIFY.

OASIS_UNSAFE_SKIP_LOCAL_VERIFY: technically there is nothing unsafe about it as this is local sanity check prior to trying to register.

My suggestion is to add another env variable: OASIS_UNSAFE_SKIP_LOCAL_VERIFY, which skips local verification prior to submitting attestation via transaction.

We also have RHP, which currently passes with empty default policy (used there) and relaxed TCB level. So this flag will not be that useful for other tests in the future, unless we rely on the default/compute policy.

In practice this variable should gate:

oasis-core/go/common/sgx/pcs/service.go

Line 118 in 59b8f32

if err = qs.verifyBundle(quote, quotePolicy, fresh, "fresh"); err == nil {

oasis-core/runtime/src/identity.rs

Line 273 in 59b8f32

let verified_quote = quote.verify(policy)?;

It cannot live inside those functions as it should only apply for local verification, but should NOT pass remote verification, unless specified by other environment variables. The easiest way would be to add another parameter to both verify functions specifying if verification should be ignored, in addition to internal global state flags (e.g. see) that currently define whether verification should be ignored/relaxed regardless whether this is local or remote/consensus verification.

I don't like this additional variable as it seems it would introduce additional complexity around checks. I would skip it for now.

Removed this for now as I agree this adds too much complexity.

One option would be to only use default policy during local validation, like is currently done for RHP, as consensus would catch wrong attestation anyways. This would simplify our code further (removing the need for ff833fc) and enable us to keep previous unhappy test. On the other hand this may cause honest nodes to post invalid registration transactions. Also not most robust/maintainable in the long run but it achieves the goal.

If anyone sees a clean way to test unhappy path is rejected, I am willing to implement it here.

martintomazic · 2026-03-18T09:15:04Z

+        /// The compute runtime quote policy.
+        #[cbor(optional)]
+        compute_policy: sgx::QuotePolicy,


This field is ignored for now and only here to pass e2e test.

In practice we probably need new RHP method (similar to node identity), to signal whether compute policy should be respected or not. Technically not breaking?

The runtime should sign with its RAK whether the intention of this enclave is to be used as the one registering on the consensus (having access to keymanager) or is the intention to be used in the less constrained environments (default policy only). Similar solution to what is done for the node_id, i.e. make it part of the hash structure and sign with RAK and validate signature on the consensus side.
Do we really need this / what are possible attack vectors without it?

I would avoid this extra verification for now as I don't see what attack this would prevent. It is other runtimes and the consensus layer that are enforcing this policy.

Agree as long as we do 2 side verification on both enclave_rpc/consensus side (confirmed myself again) exploiting this is not possible. Moreover, currently the keymanager policy must be respected on both sides when interacting with it.

If however, at an point we change keymanager-RONL interaction, so that RONL must only satisfy its policy when querying keymanager for its secrets, we should be careful, not to accidentally use the default one, especially if we introduce remote RONLs. The problem there would be if the ROFL would want stricter policy then the default RONL, but again this would be mitigated by the enclave rpc using compute policy.

Having ROFL requiring stricter policy than the keymanager/compute nodes wouldn't make sense given it depends on them directly, so ignoring this corner case.

I guess this willl used/done as part of the #6317 then.

codecov · 2026-03-18T09:38:05Z

Codecov Report

❌ Patch coverage is 89.81481% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.41%. Comparing base (262b576) to head (aecba0e).
⚠️ Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
go/common/node/node.go	76.92%	0 Missing and 3 partials ⚠️
go/common/node/sgx.go	91.66%	1 Missing and 1 partial ⚠️
go/common/sgx/pcs/pcs.go	0.00%	2 Missing ⚠️
go/worker/common/committee/node.go	93.33%	0 Missing and 2 partials ⚠️
go/consensus/cometbft/apps/scheduler/scheduler.go	87.50%	0 Missing and 1 partial ⚠️
go/worker/keymanager/worker.go	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6474      +/-   ##
==========================================
+ Coverage   63.25%   63.41%   +0.16%     
==========================================
  Files         698      698              
  Lines       68282    68345      +63     
==========================================
+ Hits        43192    43344     +152     
+ Misses      20124    20052      -72     
+ Partials     4966     4949      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kostko · 2026-03-24T08:55:58Z

+        /// The compute runtime quote policy.
+        #[cbor(optional)]
+        compute_policy: sgx::QuotePolicy,


I would avoid this extra verification for now as I don't see what attack this would prevent. It is other runtimes and the consensus layer that are enforcing this policy.

kostko · 2026-04-09T16:47:16Z

+      - .buildkite/scripts/test_e2e.sh --timeout 20m
+        --scenario e2e/runtime/runtime-encryption
+        --scenario e2e/runtime/compute-policy
+


I don't like this additional variable as it seems it would introduce additional complexity around checks. I would skip it for now.

martintomazic

Should be ready for a final review.

Added one more check: runtime without key manager cannot have kma_policy.
Removed UseKMAPolicy verification flag (tentative).
Note CI is currently failing because of runners running out of disk space (so ignore this here).

martintomazic · 2026-04-22T13:24:33Z

+				attestCfg := n.attestationCfg(compNotify.Added)
+				if err := n.ProvisionHostedRuntimeComponent(compNotify.Added, attestCfg); err != nil {


NIT: Not every component is TEE enabled, so maybe attestCfg should be optional here if comp.TEEKind != component.TEEKindNone...

kostko

Not part of this issue, but we need to change the 24.2 (242) upgrade identifier to something else like 26.1 as currently multiple previous versions have the same handler.

peternose · 2026-05-06T13:52:18Z

+	Features *TEEFeatures
+
+	// Now is the current consensus time.
+	Now time.Time


I would rename this to Time or Timestamp, as you never can be sure if Now is really now, and Params to State or Env, as otherwise is too general, and reorganize fields (node fields, deployment fields, consensus fields), e.g.:

// CapabilityTEEVerifyState contains .... type CapabilityTEEVerifyState struct { // Node is ... Node NodeState // Deployment is ... Deployment DeploymentState // Consensus is ... Consensus ConsensusState } type NodeState struct { // ID is the node identifier. ID signature.PublicKey } // DeploymentState contains the deployment state relevant for TEE attestation // verification. type DeploymentState struct { // Constraints are the serialized TEE constraints. Constraints []byte } // ConsensusState contains the consensus state relevant for TEE attestation. type ConsensusState struct { // Height is the current consensus height. Height uint64 // Time is the current consensus time. Time time.Time // IsFeatureVersion261 is true for consensus at version 26.1 or higher. IsFeatureVersion261 bool // Features are the TEE features and defaults advertised by the consensus layer. Features *TEEFeatures }

Renamed Now to Time.

Params, State or Env to me are equally general, and I think we use Params in the codebase for similar pattern where the goal is to have named params not to confuse order of same type (intent of this fix).

I like your ConsensusState (this one is indeed a state) separation, though exported node.ConsensusState type feels a bit off?

DeploymentState I would avoid and instead have RuntimeState: Kind, Constraints, (if kind is ever needed as you suggested). Same problem with exported public type.

In general one field nested structs I would avoid, until there is more fields and do refactor at that time.

peternose · 2026-05-13T08:09:02Z

+		if !isFeatureVersion261 {
+			return fmt.Errorf("policy should be nil")
+		}
+		if policy.IAS != nil {


Do we need this? If we keep this, after we upgrade, we cannot remove feature version check and use the same validation function for both policies.

IAS is deprecated, so when I am adding new features I prefer to not have this as an option. Similar to other thread this way protocol has less invariants, corner (dead) cases at the cost of slightly more complex code. Having lean protocol means less surprises in the future.

peternose · 2026-05-13T08:29:07Z

 				return fmt.Errorf("%w: invalid SGX TEE constraints", ErrInvalidArgument)
 			}
+			if cs.KeyManagerAccessPolicy != nil {
+				if r.Kind == KindKeyManager {


Not sure if this limitations are needed. So what if a key manager defines this policy? And a runtime might define this policy before choosing a key manager.

So what if a key manager defines this policy?

I am trying to reduce the amount of valid states, which may simplify the complexity and number of corner cases for our protocol. The cost is slightly more complex validation code.

And a runtime might define this policy before choosing a key manager.

I find this misleading though, because you are adding a policy for something you don't use. In such case this is no longer keymanager access policy only, or this is an accident and you forgot something so this should error.

Therefore the runtime should define the policy at the time of choosing keymanager iff it choses one.

peternose · 2026-05-13T08:32:55Z

 	}
 	n.svcMgr.Register(n.RuntimeRegistry)

+	// Determine whether hosted RONL components will be registered on the


You are stating here that this holds only for RONL, but the name of the variable doesn't say this. Which might lead to future problems.

Therefore, it would be better to solve this some other way, e.g. by passing node kind (which will be too specific and to much information for the code that uses this), or by sending implementation of some kind of interface. I guess the worker could know which kind of node is using running it, but not the code that the worker is using.

I could also call it RegisterHostedRONL (what I meant with compute runtime).

I see your point and the code is currently ugly but I feel alternatives make it even more complex?

peternose · 2026-05-13T08:38:14Z

 	// Provision all known components.
 	for _, comp := range bundleRegistry.Components(n.Runtime.ID()) {
-		if err := n.ProvisionHostedRuntimeComponent(comp); err != nil {
+		attestCfg := n.attestationCfg(comp)


Could this be removed by using a struct that implements GetQuotePolicy. This ways, consensus.Service could be removed from some structs.

So what you propose is

func (n *RuntimeHostNode) ProvisionHostedRuntimeComponent(comp *bundle.ExplodedComponent, provider host.ConsensusQuotePolicyProvider) error {

?

Concern

On the type system we cannot guarantee component and provider match.

This might be solved with:

type ConsensusQuotePolicyFactory func(comp, attestationCfg) host.ConsensusQuotePolicyProvider

passed to NewRuntimeHostNodeand and keeping existing ProvisionHostedRuntimeComponent signature.

Pros: we currently create attestation cfg/provisioner for the components that are not sgx enabled - see #6474 (comment). This solves it.
Cons: I hate factories as they make code hard to read and complex.

Next step

If we decide for this approach I would suggest to create one PR infront fixing the following:

Change existing GetQuotePolicy to GetConsensusQuotePolicy, as epid and ecdsa both pass nil fallback. Then in case of the rofl component, this method would return nil policy as there exist no consensus policy for it. In the case of tdx rofl (qemu) we can then do client side override with tdx defaults, "so that we can provision something" as the comment says.

Provider interface, so that consensus.Service could be removed from some structs.

peternose · 2026-05-13T08:40:29Z

 }

+// Config contains committee node specific configuration.
+type Config struct {


Is this really a config? I would say that this is more of a state or an environment, as mentioned somewhere else. And if we are defining this, why are we not including also other fields from parameters of committee.NewNode.

And if we are defining this, why are we not including also other fields from parameters of committee.NewNode.

config is static, other fields are services / runtime dependencies.

But yes maybe identity and chaincontext don't necessarily fit. I would avoid state and env as this becomes vague, and favor that every worker have config and typed services.

martintomazic force-pushed the martin/feature/compute-policy branch from d0c991e to 859e7a8 Compare March 17, 2026 14:35

martintomazic commented Mar 17, 2026

View reviewed changes

martintomazic force-pushed the martin/feature/compute-policy branch from 859e7a8 to b795a2e Compare March 18, 2026 08:55

martintomazic commented Mar 18, 2026

View reviewed changes

Comment thread go/oasis-node/cmd/node/node.go Outdated

martintomazic commented Mar 18, 2026

View reviewed changes

martintomazic marked this pull request as ready for review March 18, 2026 09:40

martintomazic requested review from kostko, peternose and ptrus as code owners March 18, 2026 09:40

martintomazic mentioned this pull request Mar 18, 2026

Per role quote policies #6471

Closed

kostko reviewed Mar 24, 2026

View reviewed changes

This was referenced Mar 24, 2026

CI: Add additional runner with lower SGX TCB for e2e tests #6478

Open

go/worker/registration: Clean dead registration signer code #6481

Merged

martintomazic force-pushed the martin/feature/compute-policy branch 3 times, most recently from d217ddf to 9675e14 Compare March 30, 2026 13:56

martintomazic requested a review from kostko March 31, 2026 13:42

kostko reviewed Apr 9, 2026

View reviewed changes

martintomazic force-pushed the martin/feature/compute-policy branch 3 times, most recently from b37e6f3 to 7466948 Compare April 15, 2026 08:19

martintomazic changed the base branch from master to martin/breaking/explicit-observer-mode April 15, 2026 08:24

martintomazic force-pushed the martin/breaking/explicit-observer-mode branch 4 times, most recently from a1fa33f to b2b7a01 Compare April 20, 2026 19:41

Base automatically changed from martin/breaking/explicit-observer-mode to master April 21, 2026 07:45

martintomazic force-pushed the martin/feature/compute-policy branch from 7466948 to 6f3cdaf Compare April 21, 2026 09:27

martintomazic requested a review from kostko April 21, 2026 10:07

kostko reviewed Apr 21, 2026

View reviewed changes

Comment thread go/oasis-node/cmd/node/node.go Outdated

martintomazic force-pushed the martin/feature/compute-policy branch from 6f3cdaf to cd10429 Compare April 22, 2026 14:15

martintomazic mentioned this pull request Apr 22, 2026

go/registry/api: Improve VerifyRuntime function signature and test suite #6512

Merged

martintomazic force-pushed the martin/feature/compute-policy branch 2 times, most recently from 3aefcc2 to aecba0e Compare April 23, 2026 07:49

martintomazic changed the title ~~Add support for compute runtime quote policy~~ Add support for key manager access quote policy Apr 23, 2026

martintomazic commented Apr 23, 2026

View reviewed changes

martintomazic requested a review from kostko April 23, 2026 08:03

kostko reviewed Apr 23, 2026

View reviewed changes

Comment thread go/worker/common/committee/node.go Outdated

martintomazic mentioned this pull request Apr 23, 2026

go: Rename feature version 242 to 261 release #6515

Merged

martintomazic force-pushed the martin/feature/compute-policy branch 2 times, most recently from 454229c to 130d2c1 Compare April 29, 2026 07:11

martintomazic requested a review from kostko April 29, 2026 22:43

kostko approved these changes Apr 29, 2026

View reviewed changes

martintomazic force-pushed the martin/feature/compute-policy branch from 130d2c1 to b49c56a Compare May 8, 2026 09:08

peternose reviewed May 13, 2026

View reviewed changes

martintomazic added 8 commits May 14, 2026 13:12

go/common/node: Move HashRAK function to a better place

1c7c5eb

go: Use CapabilityTEEVerifyParams instead of long param list

4148ba1

go/common/node: Add KeymanagerAccessPolicy to SgxConstraints

ab41235

go: Validate against keymanager access policy before registration

3cec21a

go/worker/common/committee: Add config parameter

388ddcb

go/common: Add keymanager access policy sanity test

a0175d7

go/oasis-test-runner: Add e2e test for compute runtime kma_policy

5f08f69

Fixup: Remove redundant UseKMAPolicy flag (tentative)

f38577a

martintomazic force-pushed the martin/feature/compute-policy branch from b49c56a to f38577a Compare May 14, 2026 12:45

martintomazic requested a review from peternose May 20, 2026 12:21

martintomazic mentioned this pull request May 21, 2026

go/runtime/host: Improve common.GetQuotePolicy function #6531

Open

		attestCfg := n.attestationCfg(compNotify.Added)
		if err := n.ProvisionHostedRuntimeComponent(compNotify.Added, attestCfg); err != nil {

Conversation

martintomazic commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Uh oh!

martintomazic Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kostko Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martintomazic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kostko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

martintomazic commented Mar 16, 2026 •

edited

Loading

netlify Bot commented Mar 16, 2026 •

edited

Loading

martintomazic Mar 17, 2026 •

edited

Loading

kostko Mar 24, 2026 •

edited

Loading

martintomazic Mar 30, 2026 •

edited

Loading

codecov Bot commented Mar 18, 2026 •

edited

Loading

martintomazic left a comment •

edited

Loading

martintomazic May 14, 2026 •

edited

Loading

martintomazic May 14, 2026 •

edited

Loading