Skip to content

Conversation

@LukasPukenis
Copy link
Collaborator

@LukasPukenis LukasPukenis commented Dec 5, 2025

Problem

Libtelio uses counter based mechanism when handling interface errors. The counter is used to ignore transient WireguardNT errors that happen around power-transitions and usually go away.

Current limitations:

  • Counter not reset after power-resume
  • Counter not aware of FeaturePolling::wireguard_polling_period_after_state_change
  • Hardcoded value of 10

These limitations allow libtelio to produce unnecessary critical failures that require libtelio restart.

Solution

Increase the counter threshold value.

☑️ Definition of Done checklist

  • Commit history is clean (requirements)
  • README.md is updated
  • Functionality is covered by unit or integration tests

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch 2 times, most recently from f9e65d3 to 4b9c0f6 Compare December 5, 2025 11:51
@LukasPukenis LukasPukenis changed the title Llt 6859 configurable adapter gone error parameters LLT-6859 configurable adapter-gone UAPI error parameters Dec 5, 2025
@LukasPukenis LukasPukenis changed the title LLT-6859 configurable adapter-gone UAPI error parameters LLT-6859 configurable adapter-gone error handler Dec 5, 2025
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 4b9c0f6 to 1c17d77 Compare December 5, 2025 12:09
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 1c17d77 to 2395810 Compare December 5, 2025 12:49
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 2395810 to 8cbe327 Compare December 5, 2025 12:50
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 8cbe327 to f581981 Compare December 5, 2025 14:51
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from f581981 to fa7b39c Compare December 5, 2025 14:54
@LukasPukenis LukasPukenis changed the title LLT-6859 configurable adapter-gone error handler LLT-6859 feature config for UAPI error handler Dec 5, 2025
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from fa7b39c to 5bc731b Compare December 9, 2025 14:20
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 5bc731b to 825b681 Compare December 10, 2025 12:51
@LukasPukenis LukasPukenis changed the title LLT-6859 feature config for UAPI error handler LLT-6859 Dec 10, 2025
@LukasPukenis LukasPukenis changed the title LLT-6859 LLT-6859 configurable UAPI error handler Dec 10, 2025
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 825b681 to 1049dc7 Compare December 10, 2025 13:25
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 1049dc7 to 5e76d91 Compare December 10, 2025 13:44
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 5e76d91 to 6c9662a Compare December 10, 2025 13:56
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 6c9662a to 45fe6a4 Compare December 10, 2025 13:58
@LukasPukenis LukasPukenis marked this pull request as ready for review December 12, 2025 07:18
@LukasPukenis LukasPukenis requested a review from a team as a code owner December 12, 2025 07:18
Copy link
Contributor

@mathiaspeters mathiaspeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from cde43e3 to 853eb12 Compare December 12, 2025 12:40
Copy link
Contributor

@stalowyjez stalowyjez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks rather ok, +1

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 6906a0e to 9d4063f Compare December 19, 2025 12:15
@LukasPukenis
Copy link
Collaborator Author

LukasPukenis commented Dec 19, 2025

@stalowyjez I've addressed your comments

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 9d4063f to 3a76c6b Compare December 19, 2025 18:21
stalowyjez
stalowyjez previously approved these changes Dec 22, 2025
Copy link
Contributor

@stalowyjez stalowyjez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, +1

pub struct TUNHealthMonitor {
/// Minimum duration of a continuous UAPI failure period required to
/// classify the failure as non-transient
sustained_failure_threshold: Duration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This is a nitpick, but maybe it is worth storing here just FeatureInterfaceHealth instead of having a copy of the fields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code no longer exists

// than expected, it indicates that libtelio(or device) might have been suspended. Since
// WireguardNT produces transient errors around device suspension(just before and right after),
// we must filter out those errors or else we can easily exceed sustained_failure_threshold.
let suspend_detected =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this part. As it seems like it is likely to be flaky.... 🤔 🤔 🤔 🤔 As while our period is roughly 1s, there are no guarantees that the only "interruption" can be suspension.

Although I do not have a better suggestion now 🫤.

Maybe it is worth having a small design/brainstorm session to see if we can come up to something less sensitive to timing (2s by default)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed - I just increased the counter value so this is resolved :)

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 6e68b3d to 0d8e9e5 Compare January 7, 2026 14:36
@LukasPukenis LukasPukenis changed the title LLT-6859 configurable UAPI error handler LLT-6859 adjust UAPI error handler Jan 7, 2026
@LukasPukenis
Copy link
Collaborator Author

@Jauler @stalowyjez @mathiaspeters we discussed a bit with @Jauler about the current approach and decided to go via simpler approach - increase the counter threshold. The benefits of such approach is that it's very deterministic while being very simple.

@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from 0d8e9e5 to f79344c Compare January 7, 2026 14:45
This reduces critical errors caused by transient wireguard-nt
UAPI failures at the cost of longer detection time when
interface is permanently gone.

Signed-off-by: Lukas Pukenis <lukas.pukenis@nordsec.com>
@LukasPukenis LukasPukenis force-pushed the LLT-6859_configurable_adapter_gone_error_parameters branch from f79344c to d9db805 Compare January 7, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants