Skip to content

Conversation

@Fredi-raspall
Copy link
Contributor

@Fredi-raspall Fredi-raspall commented Dec 15, 2025

On top of #1126

@Fredi-raspall Fredi-raspall requested a review from a team as a code owner December 15, 2025 18:04
@Fredi-raspall Fredi-raspall requested review from qmonnet and removed request for a team December 15, 2025 18:04
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch 3 times, most recently from 61eb9b4 to 427604c Compare December 15, 2025 22:07
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/gwname_cmdline branch from 3e5f0b3 to 52aee41 Compare December 16, 2025 11:16
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch 3 times, most recently from e51f49f to 65ecf36 Compare December 16, 2025 14:47
Copy link
Contributor

@mvachhar mvachhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine but we need a k8s implementation since #1146 cuts us over to k8s by default for vlab tests and we'll cut over shortly to agentless mode everywhere.

@mvachhar
Copy link
Contributor

There are some clippy lints that ought to be addressed. Also, for gRPC we should update the fuzzers in gateway-proto for the new fields before this merges. I've added dont't merge to this PR until someone verifies that those fuzzers were added.

@mvachhar mvachhar added the dont-merge Do not merge this Pull Request label Dec 16, 2025
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 65ecf36 to 9329e28 Compare December 16, 2025 15:04
@Fredi-raspall Fredi-raspall added dont-merge Do not merge this Pull Request and removed dont-merge Do not merge this Pull Request labels Dec 16, 2025
@Fredi-raspall
Copy link
Contributor Author

There are some clippy lints that ought to be addressed. Also, for gRPC we should update the fuzzers in gateway-proto for the new fields before this merges. I've added dont't merge to this PR until someone verifies that those fuzzers were added.

You probably mean fuzzers for k8s?

@Fredi-raspall
Copy link
Contributor Author

This looks fine but we need a k8s implementation since #1136 cuts us over to k8s by default for vlab tests and we'll cut over shortly to agentless mode everywhere.

You probably mean #1146 ?

@mvachhar
Copy link
Contributor

This looks fine but we need a k8s implementation since #1136 cuts us over to k8s by default for vlab tests and we'll cut over shortly to agentless mode everywhere.

You probably mean #1146 ?

Yep, fixed my comment.

@mvachhar
Copy link
Contributor

There are some clippy lints that ought to be addressed. Also, for gRPC we should update the fuzzers in gateway-proto for the new fields before this merges. I've added dont't merge to this PR until someone verifies that those fuzzers were added.

You probably mean fuzzers for k8s?

No, there are fuzzer generators for gRPC in gateway-proto that should be updated. We can skip that if we roll in the k8s stuff here I suppose.

@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 82484d3 to 1f36e23 Compare December 16, 2025 21:27
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/gwname_cmdline branch from bcccd52 to 7ac3866 Compare December 16, 2025 21:28
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 1f36e23 to 7b6868b Compare December 16, 2025 21:38
Base automatically changed from pr/fredi/gwname_cmdline to main December 16, 2025 23:14
Comment on lines 353 to 356
#[test]
#[traced_test]
#[wrap(with_gw_name())]
#[allow(clippy::too_many_lines)]
fn test_full_config() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call let _ = set_gw_name("test-gw"); at the beginning of function build_sample_config() in the same file instead, please? This would avoid adding the wrapping for each test.

Same for stateless NAT test (there's just one test at the moment, but I'm adding new ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_gw_name() is meant to be called just once. After the first call, it may fail by design, unless exactly the same name is used. The with_gw_name() fixture always sets the same name so that this never happens.
You need to add the wrapping only in those tests that directly or indirectly require the gateway name.
What is the advantage of calling set_get_name() vs wrapping?

Copy link
Member

@qmonnet qmonnet Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of calling set_get_name() vs wrapping?

Technically none; It's just that we'd call it just one in build_sample_config() (I tried it, it worked) and wouldn't need to tag all tests individually. I don't really see the point in tagging all tests, it stands out when looking at the tests if the reader's not familiar with the gateway naming. If we call set_get_name() from build_sample_config(), it just becomes part of the config building function for these tests, period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do that if you want. There may be other tests that require checking the name of the gateway aside from build_sample_config(). If those tests set a distinct name, they will be broken.

Copy link
Member

@qmonnet qmonnet Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be other tests that require checking the name of the gateway aside from build_sample_config(). If those tests set a distinct name, they will be broken.

I'm not sure what you mean here. Using various names across various tests seems to work fine. For example, this builds and keeps the tests working:

diff
diff --git i/Cargo.lock w/Cargo.lock
index b456197a3ae7..28474aed2189 100644
--- i/Cargo.lock
+++ w/Cargo.lock
@@ -1498,6 +1498,7 @@ dependencies = [
  "dataplane-concurrency",
  "dataplane-config",
  "dataplane-flow-info",
+ "dataplane-gwname",
  "dataplane-lpm",
  "dataplane-net",
  "dataplane-pipeline",
diff --git i/nat/Cargo.toml w/nat/Cargo.toml
index d352c6597072..8a8e9e465423 100644
--- i/nat/Cargo.toml
+++ w/nat/Cargo.toml
@@ -29,6 +29,7 @@ tracing = { workspace = true }
 
 [dev-dependencies]
 # internal
+gwname = { workspace = true }
 fixin = { workspace = true }
 test-utils = { workspace = true }
 lpm = { workspace = true, features = ["testing"] }
diff --git i/nat/src/stateful/test.rs w/nat/src/stateful/test.rs
index dd6425b6fa14..f348f9285a2b 100644
--- i/nat/src/stateful/test.rs
+++ w/nat/src/stateful/test.rs
@@ -17,6 +17,7 @@ mod tests {
     use config::external::underlay::Underlay;
     use config::internal::device::DeviceConfig;
 
+    use gwname::set_gw_name;
     use config::internal::device::settings::DeviceSettings;
     use config::internal::interfaces::interface::{IfVtepConfig, InterfaceConfig, InterfaceType};
     use config::internal::routing::bgp::BgpConfig;
@@ -352,9 +353,9 @@ mod tests {
 
     #[test]
     #[traced_test]
-    #[wrap(with_gw_name())]
     #[allow(clippy::too_many_lines)]
     fn test_full_config() {
+        let _ = set_gw_name("test-gw");
         let mut config = build_sample_config(build_overlay_4vpcs());
         config.validate().unwrap();
 
@@ -566,9 +567,9 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[traced_test]
     fn test_full_config_unidirectional_nat() {
+        let _ = set_gw_name("test-kiwi");
         let mut config = build_sample_config(build_overlay_2vpcs_unidirectional_nat());
         config.validate().unwrap();
 
@@ -654,9 +655,9 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[traced_test]
     fn test_full_config_no_nat() {
+        let _ = set_gw_name("test-banana");
         let mut config = build_sample_config(build_overlay_2vpcs_no_nat());
         config.validate().unwrap();
 
@@ -753,9 +754,9 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[traced_test]
     fn test_icmp_echo_nat() {
+        let _ = set_gw_name("test-grape");
         let mut config = build_sample_config(build_overlay_2vpcs());
         config.validate().unwrap();
 
@@ -851,9 +852,9 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[traced_test]
     fn test_icmp_echo_unidirectional_nat() {
+        let _ = set_gw_name("test-orange");
         let mut config = build_sample_config(build_overlay_2vpcs_unidirectional_nat());
         config.validate().unwrap();
 
@@ -1000,9 +1001,9 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[traced_test]
     fn test_icmp_error_nat() {
+        let _ = set_gw_name("test-plum");
         let mut config = build_sample_config(build_overlay_2vpcs());
         config.validate().unwrap();
 
@@ -1244,12 +1245,12 @@ mod tests {
     }
 
     #[test]
-    #[wrap(with_gw_name())]
     #[allow(clippy::too_many_lines)]
     fn test_full_config_unidirectional_nat_overlapping_destination() {
         let tctl = get_trace_ctl();
         let _ = tctl.setup_from_string("vpc-routing=debug,flow-lookup=debug,stateful-nat=debug");
 
+        let _ = set_gw_name("test-cherry");
         let mut config =
             build_sample_config(build_overlay_3vpcs_unidirectional_nat_overlapping_addr());
         config.validate().unwrap();
@@ -1536,9 +1537,9 @@ mod tests {
 
     #[test]
     #[traced_test]
-    #[wrap(with_gw_name())]
     #[allow(clippy::too_many_lines)]
     fn test_full_config_unidirectional_nat_overlapping_exposes_for_single_peering() {
+        let _ = set_gw_name("test-apple");
         let mut config =
             build_sample_config(build_overlay_2vpcs_unidirectional_nat_overlapping_exposes());
         // Validation fails - We currently forbid multiple peerings between any pair of VPCs. We

(But having it once in build_sample_config() is simpler of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably using nextest to run the tests. Can you try with cargo test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was using nextest. OK, with cargo test it does fail with the per-test names.

The following still works (we're calling the function multiple times across tests, but always with the same hardcoded name):

    fn build_sample_config(overlay: Overlay) -> GwConfig {
        set_gw_name("test-gw").unwrap();
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is expected. If you use the same name, set_gw_name() will not fail. But then you need to maintain the consistency of the name throughout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you mean it has to be consistent across all tests in all crates with cargo test. I thought it was just within a given crate or module, which sounded pretty manageable for tests.

In that case I understand better (at last 😅) why you want to go with the tagging approach. Let's keep this, then.

However, can I suggest that we get a more verbose error when the gateway name is not set at validation time? Right now it's only internal error: entered unreachable code, but I can totally imagine folks adding more tests with a GwConfig but forgetting to tag their test functions, and in that case a hint message would be welcome (something like if you see this in tests, look at how "#[wrap(with_gw_name())]" is used elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix that.

@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 7b6868b to 9e0f089 Compare December 17, 2025 09:42
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch 2 times, most recently from 7c94777 to bb0f4f7 Compare December 18, 2025 10:42
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This does not change the behavior of the configuration, but will
allow adding more functionality later on.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Extend the BGP Community type to be able to hold a community as
a string.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This commits augments the configuration model and code with the
new grpc version that includes new configurations to
support HA with multiple gateways. This includes:
  - grpc v0.19.0 and
  - gateway v0.32.0 (k8s interface)

There was no way to make all these changes across multiple commits.

Summary of changes:
   1) configuration includes a list of gateway groups. A gateway
     group (identified by a name), represents a set of gateways. For
     each member in the set, a gateway group contains:
       - name of gateway
       - priority within the group
       - ipaddress of the gateway (loopback interface)
   2) the configuration also includes a priority-to-community map.
     This map indicates the BGP community to use network-wide
     depending on nodes' priorities.
   3) Note: the priority in a group is DECOUPLED from the priority
     in the community table. Group priorities only express ordering;
     i.e. their absolute value only matters in relation to other
     priorities. They are simply used to order gateways in a group
     in terms of preference or importance. That ordering (position)
     is what determines the community with which prefix peerings
     should be advertised.
   4) Vpc peerings are augmented with the name of a gateway group.
     From that gateway group, the name of the gateway and its
     position within the group, and the community table, we
     determine whether the gw should advertise a prefix and
     with what community.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Extend the route-map for advertising routes in a VPC so that
routes get tagged with communities.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Some configuration objects related to addresses contain a mask
length which is unnecessary. Add methods to attempt to convert
those anyway, ignoring the mask.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Do not fail if no community has been specified.
We relax this for the time being in order to test.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Remove unnecessary clone() call of hostname, which is
probably a leftover sanity.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This is needed to transition to the new grpc model where
no hostname comes from the config.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Deprecating those fields requires a new version of gateway-proto.
Otherwise the fuzzing tests fail.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
This is temporary as we may require an explicit prority.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
.. and adapt the code accordingly.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 9c6b110 to 38da8f8 Compare December 19, 2025 17:37
@Fredi-raspall Fredi-raspall added the ci:-upgrade Disable VLAB upgrade tests label Dec 19, 2025
We need a deterministic ordering of members in a group.
Since priorities may match, break the ties with the ip address.
Ordering is such that high prio wins and higher address wins.

Notes:
 - we use the ip address to break the ties since it is ensured to
   be distinct. We may relax this condition if ever we want to
   allow some form of anycast.
 - the Ord/PartialOrd assume that the ip addresses can match.
   But that will not happen with the current restrictions.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/multigw_failover branch from 455d0f7 to 3ee0cb5 Compare December 22, 2025 09:09
@Fredi-raspall Fredi-raspall removed the dont-merge Do not merge this Pull Request label Dec 22, 2025
@Fredi-raspall Fredi-raspall added this pull request to the merge queue Dec 22, 2025
Merged via the queue into main with commit cfda58b Dec 22, 2025
22 of 23 checks passed
@Fredi-raspall Fredi-raspall deleted the pr/fredi/multigw_failover branch December 22, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:-upgrade Disable VLAB upgrade tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants