DroneCAN: ISR-driven TX for STM32F7 bxCAN driver#11560
Open
daijoubu wants to merge 39 commits into
Open
Conversation
Drop-in replacement of STM32F7xx HAL driver and CMSIS Device headers from STM32CubeF7. Updates from v1.2.2 (2017) to v1.3.3 (2025), incorporating: - 8+ years of bug fixes and errata workarounds - SD card reliability improvements - I2C transmission stalled workaround - UART DMA race condition fixes - USB connection/disconnect handling - ETH receive process rework API verified compatible - no code changes required. Existing HAL config file at src/main/target/stm32f7xx_hal_conf.h remains unchanged. This update is applied on top of DroneCAN integration (PR iNavFlight#11313). Related: iNavFlight#11299 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change __FPU_PRESENT cmake definitions from =1 to =1U in cortex-m7.cmake and cortex-m4f.cmake. The device headers define it as 1U; GCC treats 1 and 1U as different token strings and warns on redefinition. The cmake flag must be kept (CMSIS DSP math library requires it). Fix ART_ACCLERATOR_ENABLE (misspelled) → ART_ACCELERATOR_ENABLE in stm32f7xx_hal_conf.h to avoid redefinition with the legacy alias added in HAL v1.3.3 (stm32_hal_legacy.h:4402). Both warnings repeated across all 528 STM32F7 compilation units and are now fully resolved. MATEKF765SE builds cleanly.
Two vendor files in STM32F7xx HAL v1.3.3 produce -Wunused-parameter warnings that fail CI builds with -DWARNINGS_AS_ERRORS=ON: - stm32f7xx_ll_rcc.h: inline function with unused LPTIMx parameter (fires in every translation unit that includes the header) - stm32f7xx_ll_rcc.c: LL_RCC_GetSPDIFRXClockFreq unused parameter Fix 1: Add SYSTEM_INCLUDE_DIRECTORIES support to add_stm32_executable and target_stm32 in stm32.cmake, and route STM32F7_INCLUDE_DIRS through it in stm32f7.cmake. This causes the HAL/CMSIS include dirs to be added with -isystem, suppressing warnings from vendor headers. Fix 2: Set COMPILE_OPTIONS "-Wno-unused-parameter" on stm32f7xx_ll_rcc.c via set_source_files_properties inside target_stm32f7xx(), scoped to the caller's directory where add_executable() runs.
HAL v1.3.3 changed HAL_CAN_Init() to OR the SyncJumpWidth, TimeSeg1, and TimeSeg2 fields directly into the BTR register, expecting them to be pre-shifted to their correct bit positions (TS1@16, TS2@20, SJW@24). The driver was passing raw 0-based counts (e.g. 6 for 7TQ), which the old HAL would shift internally. With HAL v1.3.3, these unshifted values landed in the BRP prescaler field, producing completely wrong bit timing that caused immediate bus errors and prevented all CAN transmission. Fix: shift the computed timing values to their BTR register positions using the CAN_BTR_TS1_Pos, CAN_BTR_TS2_Pos, and CAN_BTR_SJW_Pos constants before assigning to hcan1.Init.
…ame transfers processCanardTxQueue() was only called once at the top of the state before RX processing. Response frames queued by RX handlers (e.g. GetNodeInfo) were not drained until the next 2ms task cycle. With NodeStatus occupying one of three STM32 CAN TX mailboxes, only two response frames could be sent before the mailboxes filled, preventing multi-frame EOT from being transmitted. Add processCanardTxQueue() after the RX loop and after process1HzTasks() so any frames queued in the same cycle are sent immediately.
Updates CAN current/vbat sources, can_speed allowed values, DRONECAN GPS protocol, and OSD_REFRESH debug mode.
stm32cubef7_extract/cmsis_device_f7 and stm32cubef7_extract/stm32f7xx_hal_driver were tracked as gitlink entries (mode 160000) with an empty .gitmodules, causing CI to fail with "No url found for submodule path" during post-job cleanup. Removed embedded .git directories and re-added the HAL sources as regular tracked files (mode 100644).
… verbosity Remove LOG_DEBUG from canardSTM32Transmit() failure path — this will run in ISR callback context after the TX ISR migration and calling printf from an ISR is undefined behaviour. Collapse 5 intermediate canardSTM32ComputeTimings() debug calls to the single summary line already present at the end of the function. Remove per-frame success-path LOG_DEBUG calls in dronecan.c transfer handlers. These fire on every received GPS fix, NodeStatus and battery update and are too noisy for upstream submission.
Replace blocking HAL_CAN_AddTxMessage() polling in canardSTM32Transmit()
with a SPSC SW ring buffer (TX_QUEUE_SIZE=32) drained by CAN1_TX_IRQHandler
via HAL_CAN_TxMailbox{0,1,2}CompleteCallback().
Bootstrapping: when all 3 mailboxes are idle RQCP=0 so TMEIE alone won't
fire; seed the HW directly from the main-loop path when
GetTxMailboxesFreeLevel()==3. ISR handles all subsequent frames.
CAN_IT_TX_MAILBOX_EMPTY is activated on each enqueue and deactivated when
the queue drains, keeping the interrupt silent at idle.
Adds four debug visibility hooks to the STM32F7 CAN TX ISR driver: - canTxDropped counter: incremented when SW TX queue (depth 32) is full and canardSTM32Transmit() silently drops a frame. Cumulative across the session; exposed via canardProtocolStatus_t.tx_dropped. - ESR register fields in canardProtocolStatus_t: tec (TX error counter), rec (RX error counter), lec (last error code) read from hcan1.Instance->ESR. Reads BusOff/ErrorPassive atomically with the ESR snapshot. - canardSTM32GetTxQueueFillLevel(): returns current SW queue depth as (head - tail + TX_QUEUE_SIZE) % TX_QUEUE_SIZE. - CLI 'dronecan' command (USE_DRONECAN guard): prints all of the above plus RX buffer fill level in a single readable block. Useful for polling during long-running hardware validation tests without needing the configurator or MSP.
- Add NVIC_PRIO_CAN = 4 to nvic.h; use it for CAN1_RX0 and CAN1_TX IRQs (was 0, violating NVIC_PRIO_MAX=1 and risking preemption of motor timers at priority 3) - Add canTxQueueHWM / canRxBufferHWM tracking; expose via canardProtocolStatus_t and dronecan CLI command as "N / 32 (hwm: N)" for both TX queue and RX buffer - Include drivers/nvic.h in canard_stm32f7xx_driver.c
…pace - Replace direct canTxQueuePop() in main-loop seed path with canTxDrainQueue() called under __disable_irq()/__enable_irq(), making the ISR the sole consumer of the SW TX queue at all times - Only call ActivateNotification if queue is non-empty after seeding, eliminating the spurious ISR invocation on single-frame bursts - Add forward declaration for canTxDrainQueue() - Strip trailing whitespace throughout file
Add __DMB() in canTxQueuePush() between writing frame data and advancing head, and in canTxQueuePop() between reading head and reading frame data. Prevents the M7 write buffer from reordering the head store before the frame data store is globally visible.
Enable TransmitFifoPriority (TXFP) so the peripheral transmits mailboxes in request order rather than mailbox-number order. Without TXFP, loading the next frame into mailbox 0 after it completes causes it to transmit before frames still pending in mailboxes 1 and 2, corrupting the toggle bit sequence of multi-frame UAVCAN transfers. With TXFP enabled, restore the while loop in canTxDrainQueue to fill all free mailboxes per ISR callback for full pipeline utilization. Also fix remaining code review findings: - Check HAL_CAN_AddTxMessage return value; count failures as dropped frames - Add CanTxQueue_t struct tag to match RxBuffer_t convention - Remove blank line artifact at top of onTransferReceived body
- Add volatile to RxBuffer writeIndex/readIndex: written in ISR and read in main loop, missing volatile allows compiler to cache stale values causing silent receive stalls - Peek-then-pop in canTxDrainQueue: advance tail only after HAL_CAN_ AddTxMessage succeeds so a HAL rejection never silently loses a frame - Move ActivateNotification inside critical section: eliminates race where ISR could drain queue and deactivate between __enable_irq and the ActivateNotification call, leaving a spurious notification armed - Snapshot tail before advancing head in canTxQueuePush: prevents ISR advancing tail between the HWM read and head write, which understated the high water mark - Remove dead canTxQueuePop (inlined into peek-then-pop drain logic) - Remove dead RxFrame_t rxMsg global - Fix 3-space indent on first if in canardSTM32Transmit - Fix missing space in ErrorPassive CLI output - Add missing newline at EOF
…eNotification error handling - Fix REC register extraction: ESR bits 7:0 are error flags, REC lives at bits 31:24 (CAN_ESR_REC_Pos=24). Previous code always read near- zero instead of the actual receive error counter. - Remove canTxDropped increment from ISR drain path: a HAL_CAN_ AddTxMessage failure leaves tail unadvanced so the frame is retried on the next ISR — it is not dropped. Incrementing the drop counter here gave wrong semantics and created a dual-writer race on the volatile uint16_t counter. - Check ActivateNotification return in seed path: failure means the TX ISR will never fire for in-flight completions, stalling the queue with no indication. Return 0 on failure so the DroneCAN task retries.
…e access Replace direct struct field access in canTxDrainQueue with two new queue functions: canTxQueuePeek() returns a const pointer to the tail frame (including the DMB barrier) without consuming it, and canTxQueueConsume() advances the tail after the HW accepts the frame. This keeps queue invariants — barrier placement, index arithmetic, and the peek-then-consume ordering — inside the queue API rather than spread across canTxDrainQueue.
…, flag CLI denominators - Fix ESR bit-range annotation on rec field in header: was ESR[7:0] (error flags), correct position is ESR[31:24] per CAN_ESR_REC_Pos=24. Extraction code was already correct; wrong comment risked a future maintainer reverting it. - Add comment explaining why TxMailboxAbortCallback is not wired: AutoRetransmission=ENABLE prevents ALST; TERR leaves the frame in the queue (tail not advanced) so the next Transmit call re-seeds the HW. - Add coupling comments on hardcoded /32 denominators in CLI output so they stay in sync if TX_QUEUE_SIZE or RX_BUFFER_SIZE ever changes.
Add @brief/@param/@RetVal doc blocks to all previously undocumented functions in canard_stm32f7xx_driver.c. Fix stale FDCAN1/hfdcan copy-paste in the canardSTM32CAN1_Init block — this is bxCAN, not FDCAN.
Group functions by subsystem in all three files to aid navigation: F7 driver: Initialization → RX subsystem → TX subsystem → Diagnostics. Removes forward declarations (each function now defined before use). H7 driver: Initialization → RX → TX → Diagnostics. Same section layout as F7 for consistency. Also adds missing canardSTM32GetTxQueueFillLevel stub (declared in shared header, never implemented for H7 — caused undefined-reference link error on MATEKH743). dronecan.c: Message handlers → Libcanard callbacks → Message senders → TX/periodic processing → Public API. No logic changed. Both MATEKF765SE and MATEKH743 build clean.
…ions Public API (dronecanInit, dronecanUpdate) at the top, internals below: TX/periodic processing → message senders → libcanard callbacks → message handlers. Add forward declarations for all internal functions. Mark all internal functions static.
- canardSTM32Transmit: return 1 after successful canTxQueuePush regardless of ActivateNotification result — prevents libcanard from re-pushing the same frame and producing duplicate CAN transmissions that corrupt the UAVCAN transfer-ID toggle sequence - handle_GetNodeInfo: replace printf() with LOG_DEBUG() — printf has no output backend in firmware and risks a fault or linker error - rxBufferPushFrame/rxBufferPopFrame: change return type uint8_t → int8_t so the -1 error sentinel is type-correct
Issue 4 — AbortCallback comment: previous text incorrectly described TERR as a synchronous AddTxMessage failure and misattributed the abort path suppression. Correct explanation: AutoRetransmission=ENABLE causes the hardware to retransmit automatically on arbitration loss (ALST), so the software abort path is never triggered; bus-off is handled by the state machine in dronecanUpdate(). Issue 5 — critical section: replace __disable_irq()/__enable_irq() with ATOMIC_BLOCK(NVIC_PRIO_CAN) from build/atomic.h. This raises BASEPRI to mask only the CAN TX ISR (priority 4 and below), allowing higher-priority IRQs such as the gyro timer (priority 3) to continue firing during the mailbox-seeding window. Also simplifies the two return-1 paths into one.
- Fix 7: Clarify SJW=3 comment in canardSTM32ComputeTimings - Fix 8: Remove commented-out CubeMX scaffold from canardSTM32CAN1_Init - Fix 9: Convert tab indentation to 4-space in canardSTM32Recieve (F7) - Fix 10: Remove USER CODE BEGIN/END sentinel comments from H7 FDCAN1_Init - Fix 11: Replace Unicode section separators with ASCII dashes in all three files
… and 12) - Fix 6: Clarify hardcoded 32 denominators in CLI status output — add comments naming the matching TX_QUEUE_SIZE / RX_BUFFER_SIZE constants - Fix 12: Remove WHAT-comment "// Transmitting" from processCanardTxQueue
F7 driver: - Add __DMB() before writeIndex advance in rxBufferPushFrame (ISR→main memory ordering: ensures frame data visible before consumer sees new index) - Add __DMB() before memcpy in rxBufferPopFrame (acquire barrier: ensures ISR's writes are visible before main loop reads frame data) - Check canardSTM32ComputeTimings() return value; return error if no valid timing solution found for the requested bitrate - Mark rxBufferPushFrame, rxBufferPopFrame, rxBufferNumMessages static H7 driver: - Fix prescaler upper bound: 1024 → 512 (FDCAN NBTP.NBRP is 9-bit) - Zero-initialize canardProtocolStatus_t in GetProtocolStatus so TEC/REC/LEC/queue fields are not garbage on H7 builds - Remove always-true DataLength guard in canardSTM32Transmit; libcanard constrains data_len to 0-8 so the else branch was unreachable
- Mark canard and memory_pool static in dronecan.c (internal state, no reason for external linkage) - Fix spelling: "incremeneted" -> "incremented" in dronecan.c - Clarify H7 FDCAN_FILTER_DUAL comment: the specific filter is a no-op because ConfigGlobalFilter accepts all non-matching frames
Both F7 (max_quanta_per_bit=18) and H7 (max_quanta_per_bit=17) differ from the reference paper maximum of 17. Mark both for revisit alongside SJW tuning — SJW=3 was set conservatively and should be validated against actual crystal performance on the board.
Per INAV coding standards — dead debug logging should be deleted, not left commented out.
The raw sjw=3 value is written to the HAL without the N-1 offset that BS1/BS2 correctly apply, so the hardware actually runs 4TQ SJW not 3TQ. Mark with TODO for the planned SJW/timing tuning pass.
F7 driver: - Fix false-full race in canTxQueuePush: read tail fresh at the full-check so a concurrent ISR drain cannot cause a spurious TX drop; move snapshot after the push for HWM calculation only H7 driver: - Replace ErrorCode/!= 1 timing check with if (!canardSTM32ComputeTimings()) to match F7 pattern and remove fragile bool-stored-as-int16_t - Fix @RetVal for canardSTM32CAN1_Init: documents CANARD_OK (0) on success, not 1 as previously stated - Remove stray USER CODE BEGIN FDCAN1_Init 2 CubeMX sentinel
- Set optional_field_flags to OPTIONAL_FIELD_FLAG_VCS_COMMIT in handle_GetNodeInfo: the field is a bitmask indicating which optional fields are valid, not a storage field for patch version - Remove stale TODO comment from handle_GetNodeInfo - Drop '/ 32' denominator from CLI TX queue and RX buffer lines: value is misleading on H7 (no software queue exists); fill level alone is sufficient on both targets
- Remove rhetorical question comments from GPIO init in F7 and H7 drivers (AF selection is target-specific by design; cmake selects the right driver) - Remove "Wheres the data?" comment from canardSTM32Recieve in F7 driver - Replace "unsure about this one" on H7 TX header fields with factual notes: ESI_ACTIVE (not error-passive), NO_TX_EVENTS (no FIFO allocated), MessageMarker=0 (unused without TX events) - Normalize tab/space mixed indentation in dronecanInit in dronecan.c
- Fix F7 canardSTM32CAN1_Init @RetVal: CANARD_OK is 0, not 1 - Enable AutoRetransmission on H7 to match F7; without it frames lost to bus arbitration are silently dropped, breaking multi-frame transfers - Add volatile qualifier to RX buffer function parameters so the compiler cannot cache index reads across ISR/main-loop boundary - Rename canardSTM32Recieve -> canardSTM32Receive (fix persistent misspelling in header, F7/H7/SITL drivers, and dronecan.c call site)
…N drivers - Correct file header names in canard_stm32f7xx_driver.c and canard_stm32h7xx_driver.c - Remove leading space before #ifdef CAN1_STANDBY in both drivers - Replace all tabs with spaces in dronecan.c (RX loop, 1Hz service block, processCanardTxQueue body)
…neCAN drivers
- Add canardSTM32GetTxQueueFillLevel() stub to canard_sitl_driver.c to fix
linker error when building SITL target with USE_DRONECAN defined
- Add memset in sitlCANGetStatsStub and sitlCANGetStatsSocketCAN to zero all
canardProtocolStatus_t fields, preventing stack garbage in CLI output
- Fix stale comment in processCanardTxQueue: "TX FIFO full" -> "SW TX queue full"
- Remove redundant LOG_DEBUG("Battery Info") from onTransferReceived
- Fix 6-space indent to 4-space in CAN1_RX0_IRQHandler
…ership docs - rxBufferPopFrame: snapshot readIndex before __DMB() so the barrier precedes the data dereference, matching the push-side store-release pattern - canardSTM32Transmit: seed HW mailboxes when any mailbox is free (> 0) rather than only when all three are idle (== 3), preventing a permanent TX stall when HAL_CAN_AddTxMessage fails with the SW queue non-empty - Add counter ownership comments: canTxDropped/canTxQueueHWM written by main loop, canRxBufferHWM written by ISR; all are single-instruction atomic reads - H7 AutoRetransmission=ENABLE: add detailed comment explaining frame ordering guarantee (FDCAN TX FIFO is strictly ordered) and the TODO to evaluate whether DISABLE would be safer for the H7 polling model
Contributor
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Contributor
Author
|
Issue with F4 builds on #11514 has been fixed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates the STM32F7 bxCAN TX implementation from polling/blocking to ISR-driven transmission using a 32-deep SPSC ring queue. Also fixes several latent bugs found in H7 and SITL drivers during review.
Changes
canardSTM32GetTxQueueFillLevelstub, improved diagnosticscanardSTM32GetTxQueueFillLevelstub (was causing linker error), fixed uninitialized status fieldscanardSTM32Receive(was "Recieve")optional_field_flagsmisuse, made canard/memory_pool static, removed printf from ISR contextdronecancommand showing TX/RX queue stats and protocol statusTesting
ISR-driven TX verified queuing and transmitting frames correctly. SPSC ring buffer (32-deep) operating with no overruns. CLI
dronecancommand confirmed showing live queue fill levels and protocol status. No regression in RX or frame parsing.F4 build failure was a pre-existing issue in #11514 (
__FPU_PRESENTredefined incmake/stm32f4.cmake). The sameSYSTEM_INCLUDE_DIRECTORIESfix applied tocmake/stm32f7.cmake(commit 37e6b23) needs to be applied tocmake/stm32f4.cmake. This has been resolved in #11514.Known Deferred Items (Non-Blocking)
max_quanta_per_bit=18on F7 vs paper max of 17 (H7 uses 17) — TODO comment in placehandle_NodeStatus/handle_GNSSRCTMStream— scaffolding/silent discard, follow-up neededNone of these block merging.