-
Notifications
You must be signed in to change notification settings - Fork 2
MEP-19: Zone Awareness #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
simcod
wants to merge
8
commits into
main
Choose a base branch
from
mep-19
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
568f784
MEP-19: Zone Awareness in metal-stack.io
simcod 3a9543f
proposal 3
mwindower 872a3dd
fix spelling
mwindower 19a82f9
Update docs/contributing/01-Proposals/MEP19/README.md
simcod 008f2bb
Update docs/contributing/01-Proposals/MEP19/README.md
simcod d7c2f83
Update docs/contributing/01-Proposals/MEP19/README.md
simcod 4edb049
Update docs/contributing/01-Proposals/MEP19/README.md
simcod 3750c0c
Add review comments
simcod File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| --- | ||
| slug: /MEP-19-zone-awareness | ||
| title: MEP-19 | ||
| sidebar_position: 19 | ||
| --- | ||
|
|
||
| # Zone Awareness | ||
|
|
||
| In metal-stack, the concepts of regions and zones are currently represented implicitly through partition names rather than as dedicated API entities. This design uses naming conventions to encode both region and zone information within a partition identifier. For example, the partition name `fra_eqx_01` translates to Frankfurt (region), Equinix (zone), and 01 (partition). | ||
|
|
||
| From a networking perspective, traffic between private node networks is not routed between partitions. To prevent misconfiguration, private networks are derived from partition-scoped `supernetworks`, preventing private node networks to be used across different partitions. Only external networks such as the Internet or Datacenter Interconnect (DCI) connections can be used to route traffic between partitions. | ||
|
|
||
| Additionally, all networks have disjunct IP prefixes. With the introduction of [MEP-4](../MEP4/README.md), this behavior will change: Network prefixes may overlap across partitions but must remain disjunct within a single project. This is possible since go-ipam release `v1.12.0`, which introduced the concept of network namespaces. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Already, with current metal-stack installations, it is possible to spread a single partition across data centers. This can be achieved through the rack spreading feature (introduced by [MEP-12](../MEP12/README.md)). | ||
|
|
||
| Limitations of this feature are: It can not be explicitly decided, in which racks nodes are placed. Moreover, this is performed with a best-effort strategy. If no machine is available in one rack, it might get placed in the one where already a machine is present. | ||
|
|
||
| Another issue with this approach is that the single partition is still one failure domain, e.g. a single BGP failure could bring down the whole partition. As known from major cloud providers, zonal distribution of workload enhances availability and fault tolerance. | ||
|
|
||
| ## Requirements to Achieve this Goal | ||
|
|
||
| To support explicit region and zone concepts in metal-stack, several functional and architectural requirements must be met. The following considerations focus primarily on the Kubernetes integration and cluster topology aspects: | ||
|
|
||
| - Proper spreading of worker nodes and control plane components across [multiple zones](https://kubernetes.io/docs/setup/best-practices/multiple-zones/) and regions must be possible. | ||
| - Nodes that belong to the same Kubernetes cluster must have the capability to communicate directly with each other, even if they are located in different partitions, provided that network configurations allow this communication using their respective Node CIDRs. | ||
| - It must be possible for nodes within a single Kubernetes cluster to use different Node CIDR ranges, depending on their partition or zone assignment. Major cloud providers use node groups to configure Node CIRDs differently. | ||
| - Zones stay separate failure domains (e.g. a failure in the EVPN control-plane of one zone should not affect the other to avoid EVPN fate-sharing) | ||
|
|
||
| ## Criteria | ||
|
|
||
| - Number of hops: for communication btw. worker nodes, to the internet and to the storage. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Introduction sentence is necessary. Which criteria do we talk about? |
||
|
|
||
| Storage resources must either be strictly located in a single partition or replicated across all partitions. This can be enforced using [`allowedTopologies`](https://kubernetes.io/docs/concepts/storage/storage-classes/#allowed-topologies) within a `StorageClass`. | ||
|
|
||
| An open design question remains regarding Pod and Service CIDRs, which we usually configure for native routing (using FRR peering with CNI and with MetalLB for service exposal). In case of zonal routing, this would imply that traffic inside the FRR peering range also needs to be routable across zonal partitions. Should overlay networks be allowed or is it possible to depend on IPv6 in order to solve this issue? Further evaluation is needed to determine the optimal approach. | ||
|
|
||
| ## Proposals | ||
|
|
||
| **Proposal 1: Disjunct VNIs Across Partitions** | ||
|
|
||
|  | ||
|
|
||
| In this approach, each partition uses a distinct set of VNIs. An additional controller, most likely running on the exit switch, would be required to build and manage the corresponding route maps. | ||
|
|
||
| Each partition would maintain its own VRF. On the exit switch, routes from all VRFs associated with the same project would be imported to enable project-wide routing between partitions while maintaining isolation from other projects. | ||
|
|
||
| The firewall would need to participate in all VRFs of the cluster, ensuring consistent traffic filtering and policy enforcement across partitions. Additionally, a default route must be present within each VRF. | ||
|
|
||
| **Proposal 2: Multi-Site DCI** | ||
|
|
||
|  | ||
|
|
||
| In the second approach, the same VNIs are used across multiple partitions. This capability can be realized by leveraging features provided by the Enterprise Switch OS. | ||
|
|
||
| From a metal-stack perspective, each partition would still define separate node networks, but the same VRFs would be available in each partition. | ||
|
|
||
| To support this, the `metal-api` would need to be extended to allow identical VNIs across different networks and partitions, as long as they belong to the same project. | ||
|
|
||
| **Storage** | ||
|
|
||
| Storage aspects will likely be addressed in a dedicated MEP. However, some initial considerations are outlined here. | ||
|
|
||
|  | ||
|
|
||
| In the current architecture as illustrated above, a node accesses storage through the firewall. | ||
|
|
||
|  | ||
|
|
||
| One possible improvement would be to remove the dependency on the firewall for storage access. This could be achieved by configuring a route map on the leaf switch to establish a direct mapping between the tenant VRF and the storage VRF on a per-project basis. | ||
|
|
||
| **Proposal 3: Project-Wide Route-Leaking and Open DCI** | ||
|
|
||
| This is a mixture of proposal 1 and 2 with disjunct VNIs across partitions. | ||
|
|
||
| In this approach, each partition uses a distinct set of VNIs. The `metal-core`, running on the leaf switches, would be required to build and manage route leaks: | ||
|
|
||
| - from certain private networks (e.g. all project networks, storage network) to the local VRF (only locally held at the leaf switches) | ||
| - from the local VRF to a DCI VRF (only propagated zone-wide) | ||
|
|
||
| The open DCI is a ring of exit switches speaking plain BGP (no EVPN routes, no VXLAN) for exchanging the private supernetworks of zones (note: prefix length is longer). They operate as VTEP for the DCI VRF and is not dependent on the Multi-Site DCI feature of Enterprise SONiC. | ||
|
|
||
| Notes: | ||
|
|
||
| - cross-zone traffic is very efficiently transported, as the firewall is not in the path (fewer hops) | ||
| - this can also be used to provide worker nodes with an more efficient way to access storage systems (also not going through the firewall) | ||
|
|
||
| ## Operational Recommendations and Documentation Notes | ||
|
|
||
| Include a recommendation on the maximum practical distance between partitions within a single zone, particularly with regard to latency-sensitive components such as `etcd`. | ||
|
|
||
| ## Roadmap | ||
|
|
||
| The following tasks can be considered as next steps: | ||
|
|
||
| - Verify proposals in containerlab | ||
| - Research: Can FRR do the Multi-Site DCI Feature out-of-the-box? | ||
| - Create sample for a Gardener shoot spec and the Cluster API manifests | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| <mxfile host="65bd71144e"> | ||
| <diagram id="8gMl2hTIlcoxMkYUvRWJ" name="Page-1"> | ||
| <mxGraphModel dx="621" dy="454" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0"> | ||
| <root> | ||
| <mxCell id="0"/> | ||
| <mxCell id="1" parent="0"/> | ||
| <mxCell id="6" value="Partition 1" style="swimlane;whiteSpace=wrap;html=1;" parent="1" vertex="1"> | ||
| <mxGeometry x="120" y="40" width="240" height="240" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="2" value="<font style="font-size: 10px;">VRF1</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/switch/Switch_48_port_L3.svg;" parent="6" vertex="1"> | ||
| <mxGeometry x="81" y="48" width="78" height="52.8" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="4" value="<font style="font-size: 10px;">10.0.0.1/32</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="6" vertex="1"> | ||
| <mxGeometry x="98.69999999999999" y="160" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="10" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="6" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <mxPoint x="120" y="120" as="sourcePoint"/> | ||
| <mxPoint x="120" y="164" as="targetPoint"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="7" value="Partition 2" style="swimlane;whiteSpace=wrap;html=1;" parent="1" vertex="1"> | ||
| <mxGeometry x="480" y="40" width="240" height="240" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="11" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="7" target="5" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <mxPoint x="130" y="120" as="sourcePoint"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="3" value="<font style="font-size: 10px;">VRF2</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/switch/Switch_48_port_L3.svg;" parent="7" vertex="1"> | ||
| <mxGeometry x="90" y="48" width="78" height="52.8" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="5" value="<font style="font-size: 10px;">10.0.1.1/32</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="7" vertex="1"> | ||
| <mxGeometry x="107.70000000000005" y="160" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="9" style="edgeStyle=none;html=1;endArrow=none;endFill=0;rounded=0;curved=1;" parent="1" source="2" target="3" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <Array as="points"> | ||
| <mxPoint x="420" y="60"/> | ||
| </Array> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="13" value="Route Maps<div>without NAT</div>" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="9" vertex="1" connectable="0"> | ||
| <mxGeometry x="-0.0681" y="-19" relative="1" as="geometry"> | ||
| <mxPoint as="offset"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| </root> | ||
| </mxGraphModel> | ||
| </diagram> | ||
| </mxfile> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| <mxfile host="65bd71144e"> | ||
| <diagram id="8gMl2hTIlcoxMkYUvRWJ" name="Page-1"> | ||
| <mxGraphModel dx="434" dy="318" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0"> | ||
| <root> | ||
| <mxCell id="0"/> | ||
| <mxCell id="1" parent="0"/> | ||
| <mxCell id="6" value="Partition 1" style="swimlane;whiteSpace=wrap;html=1;" parent="1" vertex="1"> | ||
| <mxGeometry x="120" y="40" width="240" height="240" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="2" value="<font style="font-size: 10px;">VRF1</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/switch/Switch_48_port_L3.svg;" parent="6" vertex="1"> | ||
| <mxGeometry x="81" y="48" width="78" height="52.8" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="4" value="<font style="font-size: 10px;">10.0.0.1/32</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="6" vertex="1"> | ||
| <mxGeometry x="98.69999999999999" y="160" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="10" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="6" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <mxPoint x="120" y="120" as="sourcePoint"/> | ||
| <mxPoint x="120" y="164" as="targetPoint"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="7" value="Partition 2" style="swimlane;whiteSpace=wrap;html=1;" parent="1" vertex="1"> | ||
| <mxGeometry x="480" y="40" width="240" height="240" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="11" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="7" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <mxPoint x="131" y="123" as="sourcePoint"/> | ||
| <mxPoint x="130.40298507462694" y="163" as="targetPoint"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="3" value="<font style="font-size: 10px;">VRF1</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/switch/Switch_48_port_L3.svg;" parent="7" vertex="1"> | ||
| <mxGeometry x="90" y="48" width="78" height="52.8" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="5" value="<font style="font-size: 10px;">10.0.1.1/32</font>" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="7" vertex="1"> | ||
| <mxGeometry x="107.70000000000005" y="160" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="9" style="edgeStyle=none;html=1;endArrow=none;endFill=0;rounded=0;curved=1;" parent="1" source="2" target="3" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"> | ||
| <Array as="points"> | ||
| <mxPoint x="420" y="60"/> | ||
| </Array> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| </root> | ||
| </mxGraphModel> | ||
| </diagram> | ||
| </mxfile> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 61 additions & 0 deletions
61
docs/contributing/01-Proposals/MEP19/storage-current.drawio
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| <mxfile host="65bd71144e"> | ||
| <diagram id="bnkaKnrv1tXkZOrYpwZu" name="Page-1"> | ||
| <mxGraphModel dx="1086" dy="795" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0"> | ||
| <root> | ||
| <mxCell id="0"/> | ||
| <mxCell id="1" parent="0"/> | ||
| <mxCell id="10" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="1" source="2" target="5" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="2" value="" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/switch/Switch_48_port_L3.svg;" parent="1" vertex="1"> | ||
| <mxGeometry x="200" y="80" width="78" height="52.8" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="8" style="edgeStyle=none;html=1;endArrow=none;endFill=0;" parent="1" source="3" target="4" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="9" value="Tenant VRF" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];fontSize=10;" parent="8" vertex="1" connectable="0"> | ||
| <mxGeometry x="-0.4018" y="-2" relative="1" as="geometry"> | ||
| <mxPoint x="2" y="9" as="offset"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="3" value="" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="1" vertex="1"> | ||
| <mxGeometry x="360" y="80" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="4" value="" style="image;points=[];aspect=fixed;html=1;align=center;shadow=0;dashed=0;image=img/lib/allied_telesis/computer_and_terminals/Server_Desktop.svg;" parent="1" vertex="1"> | ||
| <mxGeometry x="360" y="190" width="42.599999999999994" height="54" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="5" value="" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.database;whiteSpace=wrap;" parent="1" vertex="1"> | ||
| <mxGeometry x="224" y="160" width="30" height="40" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="6" style="edgeStyle=none;html=1;entryX=0.012;entryY=0.515;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;" parent="1" source="2" target="3" edge="1"> | ||
| <mxGeometry relative="1" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="7" value="Storage VRF" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];fontSize=10;" parent="6" vertex="1" connectable="0"> | ||
| <mxGeometry x="0.2721" y="2" relative="1" as="geometry"> | ||
| <mxPoint x="-11" y="1" as="offset"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="20" value="<font style="font-size: 10px;">Firewall</font>" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" parent="1" vertex="1"> | ||
| <mxGeometry x="395" y="90" width="60" height="30" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="21" value="<font style="font-size: 10px;">Worker</font>" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" parent="1" vertex="1"> | ||
| <mxGeometry x="395" y="200" width="60" height="30" as="geometry"/> | ||
| </mxCell> | ||
| <mxCell id="24" value="" style="endArrow=classic;html=1;strokeColor=light-dark(#FF9933,#EDEDED);" parent="1" edge="1"> | ||
| <mxGeometry width="50" height="50" relative="1" as="geometry"> | ||
| <mxPoint x="462.6" y="215" as="sourcePoint"/> | ||
| <mxPoint x="280" y="60" as="targetPoint"/> | ||
| <Array as="points"> | ||
| <mxPoint x="463" y="60"/> | ||
| </Array> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| <mxCell id="31" value="Storage access" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];fontSize=8;" parent="24" vertex="1" connectable="0"> | ||
| <mxGeometry x="0.6001" y="-1" relative="1" as="geometry"> | ||
| <mxPoint x="13" as="offset"/> | ||
| </mxGeometry> | ||
| </mxCell> | ||
| </root> | ||
| </mxGraphModel> | ||
| </diagram> | ||
| </mxfile> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required? In GCP this is not the case, node IPs are not in different CIDR ranges.