Skip to content

Fix: Re-queue packet on send failure instead of dropping it#2141

Open
OverkillFPV wants to merge 3 commits intomeshcore-dev:devfrom
OverkillFPV:fix/requeue-packet-on-cad-timeout
Open

Fix: Re-queue packet on send failure instead of dropping it#2141
OverkillFPV wants to merge 3 commits intomeshcore-dev:devfrom
OverkillFPV:fix/requeue-packet-on-cad-timeout

Conversation

@OverkillFPV
Copy link

@OverkillFPV OverkillFPV commented Mar 24, 2026

When the CAD timeout's due to high traffic the packet is force sent after 4 seconds. And StartSendRaw fails due to the radio being busy the packet is dropped. This change changes it so that instead of dropping the packet entirely, it is requeued with a delay and retried. This is why some of my high traffic nodes with 20+ neighbours have become unreliable.

Copilot AI review requested due to automatic review settings March 24, 2026 07:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the Dispatcher TX failure behavior so that when startSendRaw() fails (e.g., radio busy after CAD timeout), the outbound packet is re-queued for a delayed retry instead of being released back to the pool (dropped).

Changes:

  • Re-queue the outbound packet on startSendRaw() failure rather than freeing it.
  • Add a retry delay before attempting further TX after a send-start failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +335 to +337
_mgr->queueOutbound(outbound, 0, futureMillis(getCADFailRetryDelay()));
outbound = NULL;
next_tx_time = futureMillis(getCADFailRetryDelay());
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCADFailRetryDelay() is called twice here (once for queueOutbound and once for next_tx_time). Since getCADFailRetryDelay() can be non-deterministic (e.g., Mesh::getCADFailRetryDelay() uses RNG), the packet’s scheduled retry time and the global next_tx_time throttle can diverge unpredictably. Compute the retry delay once and reuse it for both scheduling and next_tx_time.

Suggested change
_mgr->queueOutbound(outbound, 0, futureMillis(getCADFailRetryDelay()));
outbound = NULL;
next_tx_time = futureMillis(getCADFailRetryDelay());
int retry_delay = getCADFailRetryDelay();
unsigned long retry_time = futureMillis(retry_delay);
_mgr->queueOutbound(outbound, 0, retry_time);
outbound = NULL;
next_tx_time = retry_time;

Copilot uses AI. Check for mistakes.
releasePacket(outbound); // return to pool

// re-queue instead of dropping so the packet gets another chance
_mgr->queueOutbound(outbound, 0, futureMillis(getCADFailRetryDelay()));
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-queuing with a hard-coded priority 0 can change ordering semantics: lower numeric values are treated as higher priority in PacketQueue::get(), so this elevates any packet that happened to fail startSendRaw() to highest priority on retry and may starve other queued traffic. Consider preserving the original outbound priority (e.g., by storing priority on the Packet or extending PacketManager::getNextOutbound to return both packet + priority) when re-queuing.

Copilot uses AI. Check for mistakes.
@@ -330,9 +330,11 @@ void Dispatcher::checkSend() {
MESH_DEBUG_PRINTLN("%s Dispatcher::loop(): ERROR: send start failed!", getLogDateTime());
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug string says Dispatcher::loop() but this code is inside Dispatcher::checkSend(), which makes logs misleading when diagnosing TX failures. Update the message to the correct function name.

Suggested change
MESH_DEBUG_PRINTLN("%s Dispatcher::loop(): ERROR: send start failed!", getLogDateTime());
MESH_DEBUG_PRINTLN("%s Dispatcher::checkSend(): ERROR: send start failed!", getLogDateTime());

Copilot uses AI. Check for mistakes.
@OverkillFPV OverkillFPV force-pushed the fix/requeue-packet-on-cad-timeout branch from 3f9190d to 9675823 Compare March 24, 2026 07:48
@OverkillFPV
Copy link
Author

I was also wondering about increasing the timeout to >4 seconds as for some nodes with 50 neighbours it's not too unheard of to have 4 seconds of traffic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants