Conversation
bce2e1d to
5557aeb
Compare
|
I really like this proposal. Just to be sure: It would be still possible to have an One use-case would be in the |
Enable authenticated and authorized app-to-app communication via GoRouter using mutual TLS (mTLS). Applications connect to a shared internal domain (apps.mtls.internal), where GoRouter validates client certificates and enforces per-route access control using a default-deny model. Key features: - Phase 1a: Domain-specific mTLS in GoRouter (validates instance identity) - Phase 1b: Authorization enforcement via allowed_sources route option - Phase 2 (optional): Egress HTTP proxy for simplified client adoption Depends on RFC-0027 (Generic Per-Route Features) for route options support.
5557aeb to
8f3900a
Compare
@silvestre I have update the RFC with: |
|
This idea is really interesting but will be possible to have communication to app containers on different ports or different protocol than http? |
Give credit to Beyhan and Max for the initial work on this RFC
This RFC currently focuses on HTTP traffic via GoRouter, but non-HTTP protocol support is an interesting future direction. Current constraints:
What would be needed for non-HTTP support:
For now, this is out of scope to keep the RFC focused and achievable. But feel free to create a follow up RFC for Non-HTTP use cases. |
|
First, I really like the idea behind this RFC. I have a unique constraint where I need a fine grained access control at the org level on whether app-to-app mtls communications are allowed. For instance, at the platform layer, I need to enforce app-to-app mtls between organizations is not allowed, but within a space it would be, meaning you would need to be a Space Developer in both spaces. |
Implementation UpdateDraft PRs implementing Phase 1 (1a + 1b):
Just a note about the PRs, I have not yet reviewed them myself, just wanted to get something functional. Tested end-to-end on BOSH-Lite. Finding: Route Options FormatRFC-0027 doesn't allow nested objects/arrays in route options. We adapted to a flat format: // Instead of nested mtls_allowed_sources: {apps: [...]}
{"mtls_allowed_apps": "guid1,guid2", "mtls_allowed_spaces": "space-guid", "mtls_allow_any": true}Should the RFC be updated to reflect this, or should RFC-0027 be extended? Open Issue: Application Security GroupsApps need to reach GoRouter on port 443, but default ASGs block internal IPs. Currently requires manual security group creation with router IPs. Proposal: Auto-manage ASG via BOSH link when feature flag is enabled. This is not blocking (manual workaround exists) but improves operator experience. |
|
Also recorded a demo here: https://asciinema.org/a/zLXrO9ERP3lXqGuM, but this was before refactoring to flat options (still uses the nested structure, which is why cf curl is being used). |
I still do not support putting guids in (bosh or app) manifests. This does not mirror c2c and is not consistent with network policies. The c2c APIs take guids. There are no guids stored in manifests. If you want to make it mirror c2c you should make these API endpoints that can be wrapped by the CLI. Putting guids in bosh an app manifests is an anti-pattern that we should avoid. First, guids can change. Apps are pushed via blue-green deploys. Second, guids are not human-readable and it will be difficult for users to visually check to make sure the correct apps are able to access the correct routes. Third, putting these guids in manifests means that stale policies can never be automatically cleaned up. |
Based on my reading setting scope: space does not do the same thing as c2c. I believe setting scope:space will all apps in the same space to call other apps in the same space. The way c2c works is different. It has to do with the user who is creating the policy, and they must have the correct permissions in multiple spaces to make the network policy. |
ameowlia
left a comment
There was a problem hiding this comment.
The same route can be used across apps and spaces. What happens when two apps have the same route defined in the manifest, but they have different mtls_allowed_apps defined?
How would this work? Is CAPI going to track all of the attempted, but not enforced policies? I don't see this explained in the RFC. |
Does this mean that the changes made to envoy is optional for this RFC? Or that users can opt into this behavior? |
… details Based on feedback from Beyhan: reintroduce scope (org|space|any) for domain-level authorization using route-emitter endpoint tags instead of explicit GUID lists in BOSH manifests. Key changes: - scope: org/space compares caller identity against endpoint tags - Document shared route behavior: EndpointPool contains endpoints from multiple spaces, GoRouter iterates and short-circuits on first match - Replace org-scoped domain pattern with scope: org - Update cross-CF federation example to explain when explicit orgs/spaces lists are needed (remote GUIDs have no local route-emitter tags) - Add same-org, same-space, and any-authenticated-caller config examples
- Add Router Log Messages appendix documenting [RTR] log lines for all mTLS authorization scenarios (allowed, denied, not_enforced) - Clarify CAPI stores route options as inert metadata when domain authorization is not enabled (addresses ameowlia feedback) - Add Authorization model differences subsection explaining destination- controlled model vs C2C bilateral model (addresses ameowlia feedback) - Change Phase 2 from Optional to Opt-in (user enables per-app) - Replace old allowed_sources terminology with mtls_allowed_*
|
Updated the RFC in commits a48caad and deeff30. Summary of changes: Reintroduced
Router Log Messages — new appendix section (deeff30) Authorization model differences — C2C vs this RFC (deeff30) Phase 2: Egress HTTP Proxy — "Optional" → "Opt-in" (deeff30) |
|
I want to clarify the status of my feedback, because several of my comments appear to have been either missed, treated as suggestions, or acknowledged without changes to the RFC. I'm restating them here as blocking concerns. I support the goal of this RFC. Making app-to-app mTLS easier and leveraging GoRouter is a sound direction. However, I cannot approve the RFC in its current form. I believe the final comment period should be cancelled until these concerns are addressed, which I expect will require a significant rewrite of the RFC. I plan to attend the TOC meeting today to discuss these concerns. 1. No GUIDs in manifestsThe RFC currently requires app GUIDs, space GUIDs, and org GUIDs in both BOSH manifests and app manifests. This is an anti-pattern for multiple reasons:
2. Access control for a route must be auditable from a single placeThe current design splits authorization across two layers: operator-level config in BOSH manifests and developer-level config in app manifests. A user trying to understand "who can access this route?" must inspect both. This makes auditing difficult and error-prone. All policies affecting a route should be queryable from a single source. 3. Route access control should not live in app manifestsApps and routes have different lifecycles. A route can be shared across apps and spaces. Tying access policy to app push operations creates conflicts, ordering dependencies, and confusion about which policy wins. Access policies belong on the route, managed through an API, not in application manifests. 4. Phase 2 egress proxy (HTTP_PROXY) risks breaking existing trafficSetting HTTP_PROXY on the Envoy sidecar to intercept outbound HTTP traffic is a significant change to app networking behavior. If an app also sends HTTP traffic externally or via C2C, the proxy will intercept that traffic too. We attempted a similar approach years ago and it caused issues with non-mTLS traffic. The RFC needs to explain how the egress proxy avoids interfering with:
I do not support changes that break existing networking features, even if the feature is opt-in. I want to be clear: the goal of this RFC is valuable and I want to see it succeed. But these are design-level concerns that affect usability, auditability, and correctness are not minor items that can be patched over. I think the right path forward is to pause the final comment period, address these concerns, and bring back a revised RFC. |
The current C2C networking policies have a similar limitation: app GUIDs are what's eventually stored in the DB, so renaming an app or doing a blue-green deploy requires the developer to re-apply network policies. Hiding the fact that GUIDs are used under the covers is a UX flaw that exists on the C2C side too. The better long-term solution would be policies that dynamically re-evaluate when an app name changes. However, this should be a separate RFC that applies to both C2C network policies and mtls_allowed_* rules — solving it in one place benefits both features. For BOSH manifests, the GUIDs in authorization.config.orgs/spaces are specifically for cross-CF federation where we can't perform name lookups — remote org/space names have no meaning locally. |
I'm not sure why this is a use case that isn't already solved with TLS C2C. The platform enforces via C2C policies that only specific apps can talk to other apps, performing the authorization requirement for their communication. TLS C2C exists to enforce encryption. All of this is at a platform level. Moving C2C traffic into gorouter seems identical to hairpin routing at gorouter (except without involving the load balancer to distribute requests between working and failed gorouter instances, and instead relying on bosh-dns). It means C2C traffic is no longer C2C, and as a result C2C policies cannot be enforced the way they currently are.
Perhaps a service integrating/coordinating routing tables between gorouter instances on mutiple foundations would be a better solution here. |
Gorouter already has support for passing mTLS certs to client apps for validating. If this needs to support multiple apps on a single route, route services could probably be used for this use case? Why do we need to tie per-route mTLS authentication into gorouter for this? |
@cweibel can you clarify this a little? is the need to be able to prevent all c2c traffic that crosses org boundaries, but inside a space, and cross-space in the same org is allowed? Or is it specifically regarding preventing mtls on c2c connections that cross orgs (where non-tls or regular tls is allowed)? |
Hi @geofffranks, At least on the SAP side, we were unable to use C2C because there are many limitations:
My understanding from the discussions during the TOC meeting yesterday were that we can take this out of scope from this RFC.
Route Service could be a solution, but when we try to explain this to CF users and ask them to implement it, we always encounter pushback. Additionally, a Route Service introduces extra hops in the request chain and can negatively impact performance. The question for me here is whether this is a useful add-on for the platform to add native support for it. |
|
Here's my summary from the discussions during yesterday's TOC meeting: Phase 1a: mTLS Domain Infrastructure: We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone. Based on this, my understanding is that we reached a soft agreement to exclude cross-foundation configuration from the scope of the current RFC. @rkoster, please comment if you had a different impression. Phase 1b: CF Identity & Authorization: We had extensive discussions about the interface we want to support here. There were concerns about adding this to the manifest. Two options emerged from the discussions: We also discussed whether to use names or GUIDs. I believe we reached another soft agreement that using names would require access to the org, space, and app to resolve their GUIDs. Since the API level should definitely use GUIDs, my understanding is that GUIDs are acceptable when using APIs as described in Options 1 and 2. Please correct me if this isn't the case. Phase 2: Egress HTTP Proxy (Opt-in): We didn't reach this topic during the TOC meeting, but @ameowlia has raised some questions in this comment. @rkoster, I think they need a response. My ask to the people ho raised their concerns and participated in the discussion would be to check the option 1 and option 2 and propose their preference or thoughts on this. @ameowlia, @stephanme, @cweibel, @Gerg, @geofffranks |
Have we considered using routing-api's support for dynamic route changes as an option here rather than the BOSH manifest + redeploying CF just to make an domain change/update?
Can this instead be logic living in the policy-server instead of CAPI? Right now policy-server is the source of truth for all allowed app to app communications, and I don't like the idea of introducing a second source of truth. |
Thanks for the context @beyhan - can you make sure I'm understanding this correctly? SAP cannot use C2C app communications for various reasons. Its CF users would like to be able to authenticate app to app communications via mTLS instead. Existing Gorouter mTLS support could be used, but only in a case where a single app is behind a routes (or apps sharing a CA used for mTLS), storing mTLS certs in credhub. Route services could also be used for this in the case of multiple apps or routes requiring the same authentication, at the expense of more request latency. Its CF users do not want to have to do all of this work or incur the latency penalty, and want the platform to provide the authentication and authorization for them. They are fine with app to app communication having the extra hop of the gorouter, since C2C is not currently enabled for them to know any differently. |
@geofffranks Your thoughts are definitely moving in the right direction. It actually goes even further than that. The app should be secure by default. That means when I push my app, it's automatically secured by the platform and no extra steps are needed. I can imagine that with this feature, we could achieve this goal if we also extend domains in a follow-up RFC to be configured as defaults per CF org, or if we allow available domains to be restricted per CF org. In this scenario, only the secure domain could be set as the default (or the only available domain) in an CF org. When you push an app to that org, it automatically inherits the security baseline of that organization and domain. This approach simply isn't possible with route services because:
This will be great but it will extend the scope of this RFC a lot. This means that GoRouter will need to store the certificates somewhere.
This has been mentioned also during the TOC meeting yesterday but IIRC why policy-server isn't working properly yet. @rkoster, @Gerg could you please add here. |
You're absolutely right that relying on HTTP_PROXY / HTTPS_PROXY introduces the risk of intercepting traffic we don't want to modify. These environment‑variable–based conventions give us limited control, since they’re implemented by application HTTP libraries rather than the platform.
This ensures the feature cannot unexpectedly change networking behavior for existing applications. Developers only adopt it for simple scenarios where routing all external HTTP(S) via Envoy is desirable.
Yes, we discussed this, and it's still an interesting future direction. For now, we considered it out of scope because supporting dynamic mTLS domain configuration through routing‑api would require addressing how trust bundles (CA certs) are loaded and rotated outside of the BOSH lifecycle. That likely means introducing CredHub or another secure store into the chain, which significantly increases complexity for the first iteration.
These would be managed by the platform team and documented internally. Given that expected scale, BOSH‑based configuration is acceptable for the first phase.
Policy‑server is indeed authoritative for container‑to‑container (C2C) policies today. However, in this case we are defining policy at the route level, not between app instances directly.
Because route options already originate from CAPI, it was the natural integration point for the first iteration.
@Gerg — could you add more detail from your perspective and recap the reasoning we discussed in the TOC meeting? I remember you shared helpful context about why keeping route configuration within CAPI’s domain made sense, but I don’t want to misrepresent it. |
It still needs to do this now, it's just a change in what mechanism gets the cert there (BOSH to disk vs dynamic API call to database maybe via rounting-api or CAPI?). We have customers who want to avoid wildcard certs where possible, which is pushing us to think about being able to dynamically add certs to gorouter. |
I'm confused by this. Are you suggesting defining policies for apps under route-1.apps.com to be able to talk to apps under route-2.apps.com? Or policies for specific apps to talk to any apps bound to route-1.apps.com? How should this interact with shared routes Also, this doc looks possibly concerning, but maybe it's just ambiguous:
It doesn't refer to this directly as the shared-routes workflow to share routes between org/spaces, which makes me worry that there's some mechanism for using the same internal route across orgs/spaces without explicit consent of the originating route creator. But that also seems like a very insecure behavior so I would imagine that's not how we would have implemented it originally. |
@geofffranks I think this would also be really beneficial for this RFC, and it could build nicely on it. Would you mind proposing an alternative approach to handle domain configuration that fits your requirements? To avoid blocking this RFC, it would be great if we could have a rough proposal ready by the next TOC meeting so it can be considered in the discussion. |
It does not makes sense to introduce a new feature that breaks current features, even if the new feature is opt-in. If we want the platform to do the mtls automatically, then we need to find a way to send only certain traffic to envoy and not all HTTP traiffc. |
I am still not convinced on "just using GUIDs". I would be curious to see a more detailed proposal with how with would handle random app/space/org guids, how someone would use it, and how users at different levels (admin, space-dev) would be able to audit their security settings in place. One of the complains about c2c was that there is a 65k app (not application instance, app) limit. If that is really a concern, how is a user going to manage 65k random guids? |
I am starting to second guess the idea to have scopes in the bosh manifest. Having this info in the manifest and having other policy information in some other API somewhere still means that there is no single source of truth for policy requirements. For example, what would happen if the scope was "space" only. But a user provided a guid for an app that was in a different org. (1) the user has no way to view the space only restriction (2) there is no error provided that the guid provided will never work. At the VERY least the scope information needs to to synced with an API that space devs have access to. I think a better option would be that scopes should be set in the API via an admin user. |
|
The RFC doesn't explain how GoRouter will serve both mTLS and TLS traffic on the same port. Looking at the draft implementation in router.go, the mechanism relies on SNI. The problem is that if ServerName is empty, GetMtlsDomainConfig("") returns nil and the code silently falls through to non-mTLS with no error. This means the mTLS requirement is bypassed entirely. Not all clients send SNI. What happens in these cases? How will ensure this is secure? |
Additionally there's no requirement that the SNI match the Host header in the HTTP packet that gorouter is looking at. Can't malicious clients then set arbitrary SNIs to bypass the mTLS requirement and then get routed to the right application? |
|
May I suggest setting up isolated routers for domains that want different certs :) |
Summary
This RFC proposes enabling per-domain mutual TLS (mTLS) on GoRouter with optional identity extraction and authorization enforcement.
View the full RFC
This infrastructure supports multiple use cases:
apps.mtls.internalKey Points
mtls_allowed_apps,mtls_allowed_spaces,mtls_allowed_orgs,mtls_allow_any)Implementation Phases
Draft Implementation PRs
cc @cloudfoundry/toc @cloudfoundry/wg-app-runtime-interfaces