-
Notifications
You must be signed in to change notification settings - Fork 6
Multi-gateway & Failover #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-gateway & Failover #1144
Conversation
61eb9b4 to
427604c
Compare
3e5f0b3 to
52aee41
Compare
e51f49f to
65ecf36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine but we need a k8s implementation since #1146 cuts us over to k8s by default for vlab tests and we'll cut over shortly to agentless mode everywhere.
|
There are some clippy lints that ought to be addressed. Also, for gRPC we should update the fuzzers in gateway-proto for the new fields before this merges. I've added dont't merge to this PR until someone verifies that those fuzzers were added. |
65ecf36 to
9329e28
Compare
You probably mean fuzzers for k8s? |
No, there are fuzzer generators for gRPC in gateway-proto that should be updated. We can skip that if we roll in the k8s stuff here I suppose. |
82484d3 to
1f36e23
Compare
bcccd52 to
7ac3866
Compare
1f36e23 to
7b6868b
Compare
| #[test] | ||
| #[traced_test] | ||
| #[wrap(with_gw_name())] | ||
| #[allow(clippy::too_many_lines)] | ||
| fn test_full_config() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call let _ = set_gw_name("test-gw"); at the beginning of function build_sample_config() in the same file instead, please? This would avoid adding the wrapping for each test.
Same for stateless NAT test (there's just one test at the moment, but I'm adding new ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_gw_name() is meant to be called just once. After the first call, it may fail by design, unless exactly the same name is used. The with_gw_name() fixture always sets the same name so that this never happens.
You need to add the wrapping only in those tests that directly or indirectly require the gateway name.
What is the advantage of calling set_get_name() vs wrapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of calling
set_get_name()vs wrapping?
Technically none; It's just that we'd call it just one in build_sample_config() (I tried it, it worked) and wouldn't need to tag all tests individually. I don't really see the point in tagging all tests, it stands out when looking at the tests if the reader's not familiar with the gateway naming. If we call set_get_name() from build_sample_config(), it just becomes part of the config building function for these tests, period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to do that if you want. There may be other tests that require checking the name of the gateway aside from build_sample_config(). If those tests set a distinct name, they will be broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be other tests that require checking the name of the gateway aside from build_sample_config(). If those tests set a distinct name, they will be broken.
I'm not sure what you mean here. Using various names across various tests seems to work fine. For example, this builds and keeps the tests working:
diff
diff --git i/Cargo.lock w/Cargo.lock
index b456197a3ae7..28474aed2189 100644
--- i/Cargo.lock
+++ w/Cargo.lock
@@ -1498,6 +1498,7 @@ dependencies = [
"dataplane-concurrency",
"dataplane-config",
"dataplane-flow-info",
+ "dataplane-gwname",
"dataplane-lpm",
"dataplane-net",
"dataplane-pipeline",
diff --git i/nat/Cargo.toml w/nat/Cargo.toml
index d352c6597072..8a8e9e465423 100644
--- i/nat/Cargo.toml
+++ w/nat/Cargo.toml
@@ -29,6 +29,7 @@ tracing = { workspace = true }
[dev-dependencies]
# internal
+gwname = { workspace = true }
fixin = { workspace = true }
test-utils = { workspace = true }
lpm = { workspace = true, features = ["testing"] }
diff --git i/nat/src/stateful/test.rs w/nat/src/stateful/test.rs
index dd6425b6fa14..f348f9285a2b 100644
--- i/nat/src/stateful/test.rs
+++ w/nat/src/stateful/test.rs
@@ -17,6 +17,7 @@ mod tests {
use config::external::underlay::Underlay;
use config::internal::device::DeviceConfig;
+ use gwname::set_gw_name;
use config::internal::device::settings::DeviceSettings;
use config::internal::interfaces::interface::{IfVtepConfig, InterfaceConfig, InterfaceType};
use config::internal::routing::bgp::BgpConfig;
@@ -352,9 +353,9 @@ mod tests {
#[test]
#[traced_test]
- #[wrap(with_gw_name())]
#[allow(clippy::too_many_lines)]
fn test_full_config() {
+ let _ = set_gw_name("test-gw");
let mut config = build_sample_config(build_overlay_4vpcs());
config.validate().unwrap();
@@ -566,9 +567,9 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[traced_test]
fn test_full_config_unidirectional_nat() {
+ let _ = set_gw_name("test-kiwi");
let mut config = build_sample_config(build_overlay_2vpcs_unidirectional_nat());
config.validate().unwrap();
@@ -654,9 +655,9 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[traced_test]
fn test_full_config_no_nat() {
+ let _ = set_gw_name("test-banana");
let mut config = build_sample_config(build_overlay_2vpcs_no_nat());
config.validate().unwrap();
@@ -753,9 +754,9 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[traced_test]
fn test_icmp_echo_nat() {
+ let _ = set_gw_name("test-grape");
let mut config = build_sample_config(build_overlay_2vpcs());
config.validate().unwrap();
@@ -851,9 +852,9 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[traced_test]
fn test_icmp_echo_unidirectional_nat() {
+ let _ = set_gw_name("test-orange");
let mut config = build_sample_config(build_overlay_2vpcs_unidirectional_nat());
config.validate().unwrap();
@@ -1000,9 +1001,9 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[traced_test]
fn test_icmp_error_nat() {
+ let _ = set_gw_name("test-plum");
let mut config = build_sample_config(build_overlay_2vpcs());
config.validate().unwrap();
@@ -1244,12 +1245,12 @@ mod tests {
}
#[test]
- #[wrap(with_gw_name())]
#[allow(clippy::too_many_lines)]
fn test_full_config_unidirectional_nat_overlapping_destination() {
let tctl = get_trace_ctl();
let _ = tctl.setup_from_string("vpc-routing=debug,flow-lookup=debug,stateful-nat=debug");
+ let _ = set_gw_name("test-cherry");
let mut config =
build_sample_config(build_overlay_3vpcs_unidirectional_nat_overlapping_addr());
config.validate().unwrap();
@@ -1536,9 +1537,9 @@ mod tests {
#[test]
#[traced_test]
- #[wrap(with_gw_name())]
#[allow(clippy::too_many_lines)]
fn test_full_config_unidirectional_nat_overlapping_exposes_for_single_peering() {
+ let _ = set_gw_name("test-apple");
let mut config =
build_sample_config(build_overlay_2vpcs_unidirectional_nat_overlapping_exposes());
// Validation fails - We currently forbid multiple peerings between any pair of VPCs. We(But having it once in build_sample_config() is simpler of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably using nextest to run the tests. Can you try with cargo test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was using nextest. OK, with cargo test it does fail with the per-test names.
The following still works (we're calling the function multiple times across tests, but always with the same hardcoded name):
fn build_sample_config(overlay: Overlay) -> GwConfig {
set_gw_name("test-gw").unwrap();
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is expected. If you use the same name, set_gw_name() will not fail. But then you need to maintain the consistency of the name throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok you mean it has to be consistent across all tests in all crates with cargo test. I thought it was just within a given crate or module, which sounded pretty manageable for tests.
In that case I understand better (at last 😅) why you want to go with the tagging approach. Let's keep this, then.
However, can I suggest that we get a more verbose error when the gateway name is not set at validation time? Right now it's only internal error: entered unreachable code, but I can totally imagine folks adding more tests with a GwConfig but forgetting to tag their test functions, and in that case a hint message would be welcome (something like if you see this in tests, look at how "#[wrap(with_gw_name())]" is used elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will fix that.
7b6868b to
9e0f089
Compare
7c94777 to
bb0f4f7
Compare
ca9283d to
230afbc
Compare
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This does not change the behavior of the configuration, but will allow adding more functionality later on. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Extend the BGP Community type to be able to hold a community as a string. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This commits augments the configuration model and code with the
new grpc version that includes new configurations to
support HA with multiple gateways. This includes:
- grpc v0.19.0 and
- gateway v0.32.0 (k8s interface)
There was no way to make all these changes across multiple commits.
Summary of changes:
1) configuration includes a list of gateway groups. A gateway
group (identified by a name), represents a set of gateways. For
each member in the set, a gateway group contains:
- name of gateway
- priority within the group
- ipaddress of the gateway (loopback interface)
2) the configuration also includes a priority-to-community map.
This map indicates the BGP community to use network-wide
depending on nodes' priorities.
3) Note: the priority in a group is DECOUPLED from the priority
in the community table. Group priorities only express ordering;
i.e. their absolute value only matters in relation to other
priorities. They are simply used to order gateways in a group
in terms of preference or importance. That ordering (position)
is what determines the community with which prefix peerings
should be advertised.
4) Vpc peerings are augmented with the name of a gateway group.
From that gateway group, the name of the gateway and its
position within the group, and the community table, we
determine whether the gw should advertise a prefix and
with what community.
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Extend the route-map for advertising routes in a VPC so that routes get tagged with communities. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Some configuration objects related to addresses contain a mask length which is unnecessary. Add methods to attempt to convert those anyway, ignoring the mask. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Do not fail if no community has been specified. We relax this for the time being in order to test. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Remove unnecessary clone() call of hostname, which is probably a leftover sanity. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This is needed to transition to the new grpc model where no hostname comes from the config. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Deprecating those fields requires a new version of gateway-proto. Otherwise the fuzzing tests fail. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This is temporary as we may require an explicit prority. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
.. and adapt the code accordingly. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
9c6b110 to
38da8f8
Compare
We need a deterministic ordering of members in a group. Since priorities may match, break the ties with the ip address. Ordering is such that high prio wins and higher address wins. Notes: - we use the ip address to break the ties since it is ensured to be distinct. We may relax this condition if ever we want to allow some form of anycast. - the Ord/PartialOrd assume that the ip addresses can match. But that will not happen with the current restrictions. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
455d0f7 to
3ee0cb5
Compare
On top of #1126