Skip to content

RFC: Domain-Scoped mTLS for GoRouter#1438

Open
rkoster wants to merge 14 commits intomainfrom
rfc-app-to-app-mtls-routing
Open

RFC: Domain-Scoped mTLS for GoRouter#1438
rkoster wants to merge 14 commits intomainfrom
rfc-app-to-app-mtls-routing

Conversation

@rkoster
Copy link
Contributor

@rkoster rkoster commented Feb 17, 2026

Summary

This RFC proposes enabling per-domain mutual TLS (mTLS) on GoRouter with optional identity extraction and authorization enforcement.

View the full RFC

This infrastructure supports multiple use cases:

  • CF app-to-app routing: Authenticated internal communication via apps.mtls.internal
  • External client certificates: Partner integrations, IoT devices on specific domains
  • Cross-CF federation: Secure communication between CF installations

Key Points

  • No new infrastructure: Uses existing GoRouter with domain-specific mTLS configuration
  • Default-deny security: For CF app-to-app routing, routes blocked unless explicitly allowed
  • RFC-0027 compliant: Uses flat route options (mtls_allowed_apps, mtls_allowed_spaces, mtls_allowed_orgs, mtls_allow_any)
  • Layered authorization: Domain-level (operator) + route-level (developer) access control

Implementation Phases

  • Phase 1a (mTLS Infrastructure): GoRouter validates client certificates for configured domains
  • Phase 1b (Authorization): CF identity extraction and per-route access control
  • Phase 2 (Optional): Egress HTTP proxy for simplified client adoption

Draft Implementation PRs

cc @cloudfoundry/toc @cloudfoundry/wg-app-runtime-interfaces

@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from bce2e1d to 5557aeb Compare February 17, 2026 15:47
@rkoster rkoster added toc rfc CFF community RFC labels Feb 17, 2026
@rkoster rkoster requested review from a team, Gerg, beyhan, cweibel and stephanme and removed request for a team February 17, 2026 15:55
@silvestre
Copy link
Member

I really like this proposal.

Just to be sure: It would be still possible to have an apps.mtls.internal route allowing access for any source app, so that the authorization check could be done in the app, right?

One use-case would be in the app-autoscaler service, where we expose an mTLS endpoint but check the authorization by determining if the app is bound to an autoscaler service instance, which is dynamic information we could not determine during route creation.

Enable authenticated and authorized app-to-app communication via GoRouter
using mutual TLS (mTLS). Applications connect to a shared internal domain
(apps.mtls.internal), where GoRouter validates client certificates and
enforces per-route access control using a default-deny model.

Key features:
- Phase 1a: Domain-specific mTLS in GoRouter (validates instance identity)
- Phase 1b: Authorization enforcement via allowed_sources route option
- Phase 2 (optional): Egress HTTP proxy for simplified client adoption

Depends on RFC-0027 (Generic Per-Route Features) for route options support.
@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from 5557aeb to 8f3900a Compare February 17, 2026 19:04
@rkoster
Copy link
Contributor Author

rkoster commented Feb 17, 2026

I really like this proposal.

Just to be sure: It would be still possible to have an apps.mtls.internal route allowing access for any source app, so that the authorization check could be done in the app, right?

One use-case would be in the app-autoscaler service, where we expose an mTLS endpoint but check the authorization by determining if the app is bound to an autoscaler service instance, which is dynamic information we could not determine during route creation.

@silvestre I have update the RFC with:

applications:
- name: autoscaler-api
  routes:
  - route: autoscaler.apps.mtls.internal
    options:
      allowed_sources:
        any: true

@theghost5800
Copy link
Contributor

This idea is really interesting but will be possible to have communication to app containers on different ports or different protocol than http?

Give credit to Beyhan and Max for the initial work on this RFC
@rkoster
Copy link
Contributor Author

rkoster commented Feb 19, 2026

This idea is really interesting but will be possible to have communication to app containers on different ports or different protocol than http?

This RFC currently focuses on HTTP traffic via GoRouter, but non-HTTP protocol support is an interesting future direction.

Current constraints:

  • GoRouter uses Go's httputil.ReverseProxy which handles HTTP semantics (headers, paths, etc.)
  • Caller identity is forwarded via the XFCC HTTP header, which doesn't exist for raw TCP
  • GoRouter does not currently support HTTP CONNECT method for tunneling

What would be needed for non-HTTP support:

  1. HTTP CONNECT tunneling in GoRouter: GoRouter would need to detect CONNECT requests, validate mTLS + allowed_sources, then hijack the connection and relay raw TCP bytes to the backend. The pattern exists (similar to WebSocket upgrades), but would require new implementation.
  2. Identity forwarding challenge: Inside a TCP tunnel there's no XFCC header. Options include:
  • PROXY protocol v2 (GoRouter sends client cert info as TLV before the TCP stream)
  • Backend also requires mTLS and validates the client cert directly
  • Application-layer identity (less secure)
  1. Envoy egress proxy: The Phase 2 egress proxy (Envoy) already supports HTTP CONNECT tunneling, so apps could potentially use CONNECT backend.apps.mtls.internal:5432 to tunnel arbitrary protocols. But GoRouter still needs to support CONNECT for this to work end-to-end.

For now, this is out of scope to keep the RFC focused and achievable. But feel free to create a follow up RFC for Non-HTTP use cases.

@beyhan beyhan moved this from Inbox to In Progress in CF Community Feb 24, 2026
@cweibel
Copy link

cweibel commented Mar 3, 2026

First, I really like the idea behind this RFC.

I have a unique constraint where I need a fine grained access control at the org level on whether app-to-app mtls communications are allowed. For instance, at the platform layer, I need to enforce app-to-app mtls between organizations is not allowed, but within a space it would be, meaning you would need to be a Space Developer in both spaces.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 5, 2026

Implementation Update

Draft PRs implementing Phase 1 (1a + 1b):

Just a note about the PRs, I have not yet reviewed them myself, just wanted to get something functional.

Tested end-to-end on BOSH-Lite.


Finding: Route Options Format

RFC-0027 doesn't allow nested objects/arrays in route options. We adapted to a flat format:

// Instead of nested mtls_allowed_sources: {apps: [...]}
{"mtls_allowed_apps": "guid1,guid2", "mtls_allowed_spaces": "space-guid", "mtls_allow_any": true}

Should the RFC be updated to reflect this, or should RFC-0027 be extended?


Open Issue: Application Security Groups

Apps need to reach GoRouter on port 443, but default ASGs block internal IPs. Currently requires manual security group creation with router IPs.

Proposal: Auto-manage ASG via BOSH link when feature flag is enabled. This is not blocking (manual workaround exists) but improves operator experience.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 5, 2026

Also recorded a demo here: https://asciinema.org/a/zLXrO9ERP3lXqGuM, but this was before refactoring to flat options (still uses the nested structure, which is why cf curl is being used).

@ameowlia
Copy link
Member

ameowlia commented Mar 17, 2026

& 3. GUIDs vs names in BOSH config: The authorization.config.orgs/spaces uses GUIDs because this mirrors how C2C network policies work—the CLI does the translation, but the underlying data is GUID-based. For the cross-CF
federation use case, we can't do dynamic lookups since GUIDs from remote installations have no local meaning. The operator-level config is also rarely changed after initial setup.

GUIDs in app manifests: Agreed this is a UX limitation. The CLI may provide translation (similar to cf add-network-policy), but the manifest would need GUIDs. This is consistent with how network policies work today.

I still do not support putting guids in (bosh or app) manifests. This does not mirror c2c and is not consistent with network policies. The c2c APIs take guids. There are no guids stored in manifests. If you want to make it mirror c2c you should make these API endpoints that can be wrapped by the CLI.

Putting guids in bosh an app manifests is an anti-pattern that we should avoid. First, guids can change. Apps are pushed via blue-green deploys. Second, guids are not human-readable and it will be difficult for users to visually check to make sure the correct apps are able to access the correct routes. Third, putting these guids in manifests means that stale policies can never be automatically cleaned up.

@ameowlia
Copy link
Member

Permissions model: This is where operator control via authorization.config becomes important. Operators can lock down access with:

scope: space — apps can only call apps in the same space
scope: org — apps can only call apps in the same org
spaces: [...] / orgs: [...] — explicit allowlist of spaces/orgs
Developers can only restrict further within these boundaries. If an operator wants the same high bar as C2C, they can set scope: space.

Based on my reading setting scope: space does not do the same thing as c2c. I believe setting scope:space will all apps in the same space to call other apps in the same space.

The way c2c works is different. It has to do with the user who is creating the policy, and they must have the correct permissions in multiple spaces to make the network policy.

@ameowlia ameowlia self-requested a review March 17, 2026 20:00
Copy link
Member

@ameowlia ameowlia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same route can be used across apps and spaces. What happens when two apps have the same route defined in the manifest, but they have different mtls_allowed_apps defined?

@ameowlia
Copy link
Member

Manifest push without permissions: The route options are applied via the routes API. If the domain doesn't have authorization.mode: cf_identity, the options are stored but not enforced—warnings appear in router logs. The push
doesn't fail.

How would this work? Is CAPI going to track all of the attempted, but not enforced policies? I don't see this explained in the RFC.

@ameowlia
Copy link
Member

Phase 2: Egress HTTP Proxy (Optional)

Does this mean that the changes made to envoy is optional for this RFC? Or that users can opt into this behavior?

rkoster added 2 commits March 20, 2026 12:57
… details

Based on feedback from Beyhan: reintroduce scope (org|space|any) for
domain-level authorization using route-emitter endpoint tags instead of
explicit GUID lists in BOSH manifests.

Key changes:
- scope: org/space compares caller identity against endpoint tags
- Document shared route behavior: EndpointPool contains endpoints from
  multiple spaces, GoRouter iterates and short-circuits on first match
- Replace org-scoped domain pattern with scope: org
- Update cross-CF federation example to explain when explicit orgs/spaces
  lists are needed (remote GUIDs have no local route-emitter tags)
- Add same-org, same-space, and any-authenticated-caller config examples
- Add Router Log Messages appendix documenting [RTR] log lines for all
  mTLS authorization scenarios (allowed, denied, not_enforced)
- Clarify CAPI stores route options as inert metadata when domain
  authorization is not enabled (addresses ameowlia feedback)
- Add Authorization model differences subsection explaining destination-
  controlled model vs C2C bilateral model (addresses ameowlia feedback)
- Change Phase 2 from Optional to Opt-in (user enables per-app)
- Replace old allowed_sources terminology with mtls_allowed_*
@rkoster
Copy link
Contributor Author

rkoster commented Mar 20, 2026

Updated the RFC in commits a48caad and deeff30. Summary of changes:

Reintroduced scope: org | space | any in authorization.config (a48caad)
@beyhan identified that GoRouter's route table already carries per-endpoint organization_id and space_id tags (set by the route-emitter for each app instance). This means scope checks don't require any new data — GoRouter can compare the caller's certificate identity against the destination endpoint's tags directly. We verified this on a live CF environment with shared routes: each endpoint in a route pool carries its own space/org tags, so scope: space on a shared route naturally allows callers from any participating space while denying callers from non-participating spaces. (experiment details)

scope is mutually exclusive with explicit orgs:/spaces: GUID lists, which remain available for cross-CF federation where remote GUIDs have no local route-emitter tags to match against.

Router Log Messages — new appendix section (deeff30)
Documented the [RTR] log lines GoRouter emits for every mTLS authorization scenario: successful authorization, denial by domain scope, denial by route-level mtls_allowed_* rules, default-deny when no options are configured, and a not_enforced warning when route options exist but the domain doesn't have authorization enabled. This also clarifies @ameowlia's question about CAPI behavior — CAPI stores route options as inert metadata and does not track whether the domain enforces them. GoRouter warns per-request in the app log stream.

Authorization model differences — C2C vs this RFC (deeff30)
Added a subsection explicitly contrasting the two models: C2C uses a bilateral model where the user creating the policy needs permissions in both source and destination spaces. This RFC uses a destination-controlled model where the route owner defines who may call them and the operator sets the maximum boundary. This is intentional — it matches how HTTP APIs work (the server defines its access policy). Operators who need the bilateral guarantee should continue using C2C for those workloads.

Phase 2: Egress HTTP Proxy — "Optional" → "Opt-in" (deeff30)
Clarified that Phase 2 is not an optional part of the RFC — it will be implemented. "Opt-in" means users enable the egress proxy per-app; it's not required for Phase 1 functionality.

@Gerg Gerg self-requested a review March 23, 2026 19:32
@ameowlia
Copy link
Member

ameowlia commented Mar 24, 2026

I want to clarify the status of my feedback, because several of my comments appear to have been either missed, treated as suggestions, or acknowledged without changes to the RFC. I'm restating them here as blocking concerns.

I support the goal of this RFC. Making app-to-app mTLS easier and leveraging GoRouter is a sound direction. However, I cannot approve the RFC in its current form. I believe the final comment period should be cancelled until these concerns are addressed, which I expect will require a significant rewrite of the RFC.

I plan to attend the TOC meeting today to discuss these concerns.

1. No GUIDs in manifests

The RFC currently requires app GUIDs, space GUIDs, and org GUIDs in both BOSH manifests and app manifests. This is an anti-pattern for multiple reasons:

  • GUIDs are not human-readable. Operators and developers cannot visually verify that the correct apps, spaces, or orgs are referenced without looking them up.
  • App GUIDs change during blue-green deploys, making policies silently stale.
  • Stale GUID references in manifests cannot be automatically cleaned up, unlike API-managed policies.

2. Access control for a route must be auditable from a single place

The current design splits authorization across two layers: operator-level config in BOSH manifests and developer-level config in app manifests. A user trying to understand "who can access this route?" must inspect both. This makes auditing difficult and error-prone. All policies affecting a route should be queryable from a single source.

3. Route access control should not live in app manifests

Apps and routes have different lifecycles. A route can be shared across apps and spaces. Tying access policy to app push operations creates conflicts, ordering dependencies, and confusion about which policy wins. Access policies belong on the route, managed through an API, not in application manifests.

4. Phase 2 egress proxy (HTTP_PROXY) risks breaking existing traffic

Setting HTTP_PROXY on the Envoy sidecar to intercept outbound HTTP traffic is a significant change to app networking behavior. If an app also sends HTTP traffic externally or via C2C, the proxy will intercept that traffic too. We attempted a similar approach years ago and it caused issues with non-mTLS traffic. The RFC needs to explain how the egress proxy avoids interfering with:

  • Outbound HTTP requests to external services
  • Existing HTTP C2C traffic
  • Any other traffic not destined for mTLS domains

I do not support changes that break existing networking features, even if the feature is opt-in.

I want to be clear: the goal of this RFC is valuable and I want to see it succeed. But these are design-level concerns that affect usability, auditability, and correctness are not minor items that can be patched over. I think the right path forward is to pause the final comment period, address these concerns, and bring back a revised RFC.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 24, 2026

& 3. GUIDs vs names in BOSH config: The authorization.config.orgs/spaces uses GUIDs because this mirrors how C2C network policies work—the CLI does the translation, but the underlying data is GUID-based. For the cross-CF
federation use case, we can't do dynamic lookups since GUIDs from remote installations have no local meaning. The operator-level config is also rarely changed after initial setup.

GUIDs in app manifests: Agreed this is a UX limitation. The CLI may provide translation (similar to cf add-network-policy), but the manifest would need GUIDs. This is consistent with how network policies work today.

I still do not support putting guids in (bosh or app) manifests. This does not mirror c2c and is not consistent with network policies. The c2c APIs take guids. There are no guids stored in manifests. If you want to make it mirror c2c you should make these API endpoints that can be wrapped by the CLI.

Putting guids in bosh an app manifests is an anti-pattern that we should avoid. First, guids can change. Apps are pushed via blue-green deploys. Second, guids are not human-readable and it will be difficult for users to visually check to make sure the correct apps are able to access the correct routes. Third, putting these guids in manifests means that stale policies can never be automatically cleaned up.

The current C2C networking policies have a similar limitation: app GUIDs are what's eventually stored in the DB, so renaming an app or doing a blue-green deploy requires the developer to re-apply network policies. Hiding the fact that GUIDs are used under the covers is a UX flaw that exists on the C2C side too.

The better long-term solution would be policies that dynamically re-evaluate when an app name changes. However, this should be a separate RFC that applies to both C2C network policies and mtls_allowed_* rules — solving it in one place benefits both features.

For BOSH manifests, the GUIDs in authorization.config.orgs/spaces are specifically for cross-CF federation where we can't perform name lookups — remote org/space names have no meaning locally.

@geofffranks
Copy link
Contributor

CF app-to-app routing: Applications need authenticated internal communication where only CF apps can connect (via instance identity), traffic stays internal, the platform enforces which apps can call which routes, and standard GoRouter features work (load balancing, retries, observability).

I'm not sure why this is a use case that isn't already solved with TLS C2C. The platform enforces via C2C policies that only specific apps can talk to other apps, performing the authorization requirement for their communication. TLS C2C exists to enforce encryption. All of this is at a platform level. Moving C2C traffic into gorouter seems identical to hairpin routing at gorouter (except without involving the load balancer to distribute requests between working and failed gorouter instances, and instead relying on bosh-dns). It means C2C traffic is no longer C2C, and as a result C2C policies cannot be enforced the way they currently are.

For BOSH manifests, the GUIDs in authorization.config.orgs/spaces are specifically for cross-CF federation where we can't perform name lookups — remote org/space names have no meaning locally.

Perhaps a service integrating/coordinating routing tables between gorouter instances on mutiple foundations would be a better solution here.

@geofffranks
Copy link
Contributor

External client certificates: Some platforms need to validate client certificates from external systems (partner integrations, IoT devices) on specific domains without affecting other domains or requiring CF-specific identity handling.

Gorouter already has support for passing mTLS certs to client apps for validating. If this needs to support multiple apps on a single route, route services could probably be used for this use case? Why do we need to tie per-route mTLS authentication into gorouter for this?

@geofffranks
Copy link
Contributor

I have a unique constraint where I need a fine grained access control at the org level on whether app-to-app mtls communications are allowed. For instance, at the platform layer, I need to enforce app-to-app mtls between organizations is not allowed, but within a space it would be, meaning you would need to be a Space Developer in both spaces.

@cweibel can you clarify this a little? is the need to be able to prevent all c2c traffic that crosses org boundaries, but inside a space, and cross-space in the same org is allowed? Or is it specifically regarding preventing mtls on c2c connections that cross orgs (where non-tls or regular tls is allowed)?

@beyhan
Copy link
Member

beyhan commented Mar 25, 2026

CF app-to-app routing: Applications need authenticated internal communication where only CF apps can connect (via instance identity), traffic stays internal, the platform enforces which apps can call which routes, and standard GoRouter features work (load balancing, retries, observability).

I'm not sure why this is a use case that isn't already solved with TLS C2C. The platform enforces via C2C policies that only specific apps can talk to other apps, performing the authorization requirement for their communication. TLS C2C exists to enforce encryption. All of this is at a platform level. Moving C2C traffic into gorouter seems identical to hairpin routing at gorouter (except without involving the load balancer to distribute requests between working and failed gorouter instances, and instead relying on bosh-dns). It means C2C traffic is no longer C2C, and as a result C2C policies cannot be enforced the way they currently are.

Hi @geofffranks,

At least on the SAP side, we were unable to use C2C because there are many limitations:

  • No load balancing. DNS load balancing could be used, but this heavily depends on proper client-side implementation, which often caches IPs for extended periods.
  • No observability for C2C from either the platform or application side
  • Limitations for large environments. In the worst case, only 65,535 participants can join a C2C network
  • Technical limitations and scaling factors that make operating C2C at scale difficult

For BOSH manifests, the GUIDs in authorization.config.orgs/spaces are specifically for cross-CF federation where we can't perform name lookups — remote org/space names have no meaning locally.

Perhaps a service integrating/coordinating routing tables between gorouter instances on mutiple foundations would be a better solution here.

My understanding from the discussions during the TOC meeting yesterday were that we can take this out of scope from this RFC.

External client certificates: Some platforms need to validate client certificates from external systems (partner integrations, IoT devices) on specific domains without affecting other domains or requiring CF-specific identity handling.

Gorouter already has support for passing mTLS certs to client apps for validating. If this needs to support multiple apps on a single route, route services could probably be used for this use case? Why do we need to tie per-route mTLS authentication into gorouter for this?

Route Service could be a solution, but when we try to explain this to CF users and ask them to implement it, we always encounter pushback. Additionally, a Route Service introduces extra hops in the request chain and can negatively impact performance. The question for me here is whether this is a useful add-on for the platform to add native support for it.

@beyhan
Copy link
Member

beyhan commented Mar 25, 2026

Here's my summary from the discussions during yesterday's TOC meeting:

Phase 1a: mTLS Domain Infrastructure:

We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone. Based on this, my understanding is that we reached a soft agreement to exclude cross-foundation configuration from the scope of the current RFC. @rkoster, please comment if you had a different impression.

Phase 1b: CF Identity & Authorization:

We had extensive discussions about the interface we want to support here. There were concerns about adding this to the manifest. Two options emerged from the discussions:
Option 1: Use the existing update route APIs
Option 2: Introduce a new API for this configuration, like the proposal here

We also discussed whether to use names or GUIDs. I believe we reached another soft agreement that using names would require access to the org, space, and app to resolve their GUIDs. Since the API level should definitely use GUIDs, my understanding is that GUIDs are acceptable when using APIs as described in Options 1 and 2. Please correct me if this isn't the case.

Phase 2: Egress HTTP Proxy (Opt-in):

We didn't reach this topic during the TOC meeting, but @ameowlia has raised some questions in this comment. @rkoster, I think they need a response.

My ask to the people ho raised their concerns and participated in the discussion would be to check the option 1 and option 2 and propose their preference or thoughts on this. @ameowlia, @stephanme, @cweibel, @Gerg, @geofffranks

@geofffranks
Copy link
Contributor

We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone

Have we considered using routing-api's support for dynamic route changes as an option here rather than the BOSH manifest + redeploying CF just to make an domain change/update?

Option 2: Introduce a new API for this configuration, like the proposal here

Can this instead be logic living in the policy-server instead of CAPI? Right now policy-server is the source of truth for all allowed app to app communications, and I don't like the idea of introducing a second source of truth.

@geofffranks
Copy link
Contributor

At least on the SAP side, we were unable to use C2C because there are many limitations:

  • No load balancing. DNS load balancing could be used, but this heavily depends on proper client-side implementation, which often caches IPs for extended periods.
  • No observability for C2C from either the platform or application side
  • Limitations for large environments. In the worst case, only 65,535 participants can join a C2C network
  • Technical limitations and scaling factors that make operating C2C at scale difficult

Route Service could be a solution, but when we try to explain this to CF users and ask them to implement it, we always encounter pushback. Additionally, a Route Service introduces extra hops in the request chain and can negatively impact performance. The question for me here is whether this is a useful add-on for the platform to add native support for it.

Thanks for the context @beyhan - can you make sure I'm understanding this correctly?

SAP cannot use C2C app communications for various reasons. Its CF users would like to be able to authenticate app to app communications via mTLS instead. Existing Gorouter mTLS support could be used, but only in a case where a single app is behind a routes (or apps sharing a CA used for mTLS), storing mTLS certs in credhub. Route services could also be used for this in the case of multiple apps or routes requiring the same authentication, at the expense of more request latency. Its CF users do not want to have to do all of this work or incur the latency penalty, and want the platform to provide the authentication and authorization for them. They are fine with app to app communication having the extra hop of the gorouter, since C2C is not currently enabled for them to know any differently.

@beyhan
Copy link
Member

beyhan commented Mar 25, 2026

At least on the SAP side, we were unable to use C2C because there are many limitations:

  • No load balancing. DNS load balancing could be used, but this heavily depends on proper client-side implementation, which often caches IPs for extended periods.
  • No observability for C2C from either the platform or application side
  • Limitations for large environments. In the worst case, only 65,535 participants can join a C2C network
  • Technical limitations and scaling factors that make operating C2C at scale difficult

Route Service could be a solution, but when we try to explain this to CF users and ask them to implement it, we always encounter pushback. Additionally, a Route Service introduces extra hops in the request chain and can negatively impact performance. The question for me here is whether this is a useful add-on for the platform to add native support for it.

Thanks for the context @beyhan - can you make sure I'm understanding this correctly?

SAP cannot use C2C app communications for various reasons. Its CF users would like to be able to authenticate app to app communications via mTLS instead. Existing Gorouter mTLS support could be used, but only in a case where a single app is behind a routes (or apps sharing a CA used for mTLS), storing mTLS certs in credhub. Route services could also be used for this in the case of multiple apps or routes requiring the same authentication, at the expense of more request latency. Its CF users do not want to have to do all of this work or incur the latency penalty, and want the platform to provide the authentication and authorization for them. They are fine with app to app communication having the extra hop of the gorouter, since C2C is not currently enabled for them to know any differently.

@geofffranks Your thoughts are definitely moving in the right direction. It actually goes even further than that. The app should be secure by default. That means when I push my app, it's automatically secured by the platform and no extra steps are needed.

I can imagine that with this feature, we could achieve this goal if we also extend domains in a follow-up RFC to be configured as defaults per CF org, or if we allow available domains to be restricted per CF org. In this scenario, only the secure domain could be set as the default (or the only available domain) in an CF org. When you push an app to that org, it automatically inherits the security baseline of that organization and domain.

This approach simply isn't possible with route services because:

  • Someone needs to implement and provide the route service
  • You have to create an instance of it
  • Finally, you have to manually bind it to your app

We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone

Have we considered using routing-api's support for dynamic route changes as an option here rather than the BOSH manifest + redeploying CF just to make an domain change/update?

This will be great but it will extend the scope of this RFC a lot. This means that GoRouter will need to store the certificates somewhere.

Option 2: Introduce a new API for this configuration, like the proposal here

Can this instead be logic living in the policy-server instead of CAPI? Right now policy-server is the source of truth for all allowed app to app communications, and I don't like the idea of introducing a second source of truth.

This has been mentioned also during the TOC meeting yesterday but IIRC why policy-server isn't working properly yet. @rkoster, @Gerg could you please add here.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 25, 2026

The RFC needs to explain how the egress proxy avoids interfering with:

  • Outbound HTTP requests to external services
  • Existing HTTP C2C traffic
  • Any other traffic not destined for mTLS domains

You're absolutely right that relying on HTTP_PROXY / HTTPS_PROXY introduces the risk of intercepting traffic we don't want to modify. These environment‑variable–based conventions give us limited control, since they’re implemented by application HTTP libraries rather than the platform.
App developers can use NO_PROXY to explicitly exclude destinations, but I agree this isn’t ideal as a primary safeguard.
To reduce risk, I propose making this behavior double opt‑in:

  • Platform opt‑in via a BOSH property
  • App opt‑in by explicitly setting HTTP_PROXY in the app environment

This ensures the feature cannot unexpectedly change networking behavior for existing applications. Developers only adopt it for simple scenarios where routing all external HTTP(S) via Envoy is desirable.

We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone

Have we considered using routing-api's support for dynamic route changes as an option here rather than the BOSH manifest + redeploying CF just to make an domain change/update?

Yes, we discussed this, and it's still an interesting future direction. For now, we considered it out of scope because supporting dynamic mTLS domain configuration through routing‑api would require addressing how trust bundles (CA certs) are loaded and rotated outside of the BOSH lifecycle. That likely means introducing CredHub or another secure store into the chain, which significantly increases complexity for the first iteration.
From an operator perspective, we expect the number of mTLS domains to remain very small. For example, at Rabobank we anticipate only:

  • a single internal domain, e.g. *.apps.mtls.internal, and
  • possibly one partner‑specific domain for a particular integration

These would be managed by the platform team and documented internally. Given that expected scale, BOSH‑based configuration is acceptable for the first phase.

Option 2: Introduce a new API for this configuration, like the proposal here

Can this instead be logic living in the policy-server instead of CAPI? Right now policy-server is the source of truth for all allowed app to app communications, and I don't like the idea of introducing a second source of truth.

Policy‑server is indeed authoritative for container‑to‑container (C2C) policies today. However, in this case we are defining policy at the route level, not between app instances directly.
These mTLS domain settings ultimately need to:

  • flow through Diego and the route‑emitter
  • arrive at Gorouter via route options

Because route options already originate from CAPI, it was the natural integration point for the first iteration.
During the last TOC meeting, both the CAPI and Routing teams aligned on exploring a CAPI extension rather than expanding policy‑server. The decision wasn’t a hard rejection of policy‑server—just recognition that:

  • policy‑server today has no concept of domains
  • and expanding it to cover route semantics would be a much larger architectural shift

@Gerg — could you add more detail from your perspective and recap the reasoning we discussed in the TOC meeting? I remember you shared helpful context about why keeping route configuration within CAPI’s domain made sense, but I don’t want to misrepresent it.

@geofffranks
Copy link
Contributor

This will be great but it will extend the scope of this RFC a lot. This means that GoRouter will need to store the certificates somewhere.

It still needs to do this now, it's just a change in what mechanism gets the cert there (BOSH to disk vs dynamic API call to database maybe via rounting-api or CAPI?). We have customers who want to avoid wildcard certs where possible, which is pushing us to think about being able to dynamically add certs to gorouter.

@geofffranks
Copy link
Contributor

Policy‑server is indeed authoritative for container‑to‑container (C2C) policies today. However, in this case we are defining policy at the route level, not between app instances directly.
These mTLS domain settings ultimately need to:

I'm confused by this. Are you suggesting defining policies for apps under route-1.apps.com to be able to talk to apps under route-2.apps.com? Or policies for specific apps to talk to any apps bound to route-1.apps.com? How should this interact with shared routes

Also, this doc looks possibly concerning, but maybe it's just ambiguous:

Org managers can add private domains, or custom domains, and give members of the org permission to create routes for privately registered domain names.

Private domains can be shared with other orgs and spaces. These are called as shared private domains and are not the same as shared domains. For more information, see [Shared domains (https://docs.cloudfoundry.org/devguide/deploy-apps/routes-domains.html#shared-domains).

When using private domains, you can have routes with the same host name and domain name across different orgs and spaces. This cannot be done with shared domains.

It doesn't refer to this directly as the shared-routes workflow to share routes between org/spaces, which makes me worry that there's some mechanism for using the same internal route across orgs/spaces without explicit consent of the originating route creator. But that also seems like a very insecure behavior so I would imagine that's not how we would have implemented it originally.

@beyhan
Copy link
Member

beyhan commented Mar 25, 2026

This will be great but it will extend the scope of this RFC a lot. This means that GoRouter will need to store the certificates somewhere.

It still needs to do this now, it's just a change in what mechanism gets the cert there (BOSH to disk vs dynamic API call to database maybe via rounting-api or CAPI?). We have customers who want to avoid wildcard certs where possible, which is pushing us to think about being able to dynamically add certs to gorouter.

@geofffranks I think this would also be really beneficial for this RFC, and it could build nicely on it. Would you mind proposing an alternative approach to handle domain configuration that fits your requirements? To avoid blocking this RFC, it would be great if we could have a rough proposal ready by the next TOC meeting so it can be considered in the discussion.

@ameowlia
Copy link
Member

You're absolutely right that relying on HTTP_PROXY / HTTPS_PROXY introduces the risk of intercepting traffic we don't want to modify. These environment‑variable–based conventions give us limited control, since they’re implemented by application HTTP libraries rather than the platform.
App developers can use NO_PROXY to explicitly exclude destinations, but I agree this isn’t ideal as a primary safeguard.
To reduce risk, I propose making this behavior double opt‑in:

Platform opt‑in via a BOSH property
App opt‑in by explicitly setting HTTP_PROXY in the app environment
This ensures the feature cannot unexpectedly change networking behavior for existing applications. Developers only adopt it for simple scenarios where routing all external HTTP(S) via Envoy is desirable.

It does not makes sense to introduce a new feature that breaks current features, even if the new feature is opt-in. If we want the platform to do the mtls automatically, then we need to find a way to send only certain traffic to envoy and not all HTTP traiffc.

@ameowlia
Copy link
Member

ameowlia commented Mar 26, 2026

We had extensive discussions about the interface we want to support here. There were concerns about adding this to the manifest. Two options emerged from the discussions:
Option 1: Use the existing update route APIs
Option 2: Introduce a new API for this configuration, like the proposal here

We also discussed whether to use names or GUIDs. I believe we reached another soft agreement that using names would require access to the org, space, and app to resolve their GUIDs. Since the API level should definitely use GUIDs, my understanding is that GUIDs are acceptable when using APIs as described in Options 1 and 2. Please correct me if this isn't the case.

I am still not convinced on "just using GUIDs". I would be curious to see a more detailed proposal with how with would handle random app/space/org guids, how someone would use it, and how users at different levels (admin, space-dev) would be able to audit their security settings in place.

One of the complains about c2c was that there is a 65k app (not application instance, app) limit. If that is really a concern, how is a user going to manage 65k random guids?

@ameowlia
Copy link
Member

We agreed that the BOSH-based configuration of an mTLS domain and the authorization described here (using scopes instead of org or space GUIDs) is acceptable to everyone.

I am starting to second guess the idea to have scopes in the bosh manifest. Having this info in the manifest and having other policy information in some other API somewhere still means that there is no single source of truth for policy requirements.

For example, what would happen if the scope was "space" only. But a user provided a guid for an app that was in a different org. (1) the user has no way to view the space only restriction (2) there is no error provided that the guid provided will never work.

At the VERY least the scope information needs to to synced with an API that space devs have access to. I think a better option would be that scopes should be set in the API via an admin user.

@ameowlia
Copy link
Member

The RFC doesn't explain how GoRouter will serve both mTLS and TLS traffic on the same port. Looking at the draft implementation in router.go, the mechanism relies on SNI.

The problem is that if ServerName is empty, GetMtlsDomainConfig("") returns nil and the code silently falls through to non-mTLS with no error. This means the mTLS requirement is bypassed entirely.

Not all clients send SNI. What happens in these cases? How will ensure this is secure?

@geofffranks
Copy link
Contributor

Not all clients send SNI. What happens in these cases? How will ensure this is secure?

Additionally there's no requirement that the SNI match the Host header in the HTTP packet that gorouter is looking at. Can't malicious clients then set arbitrary SNIs to bypass the mTLS requirement and then get routed to the right application?

@ameowlia
Copy link
Member

ameowlia commented Mar 27, 2026

May I suggest setting up isolated routers for domains that want different certs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc CFF community RFC toc

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

10 participants