Skip to content

Conversation

@mennovf
Copy link

@mennovf mennovf commented Nov 20, 2025

Note: Please adhere to Contributing Guidelines.

Summary

Addresses this issue: #17299

The Kconfig option CONFIG_IOB_THROTTLE is used to limit the amount allocated by TCP (and UDP) receives as to not starve sends. This distinction is made via a 'throttle' argument passed to the iob_*alloc functions, which are wrapped by net_iob*alloc. Previously the udp/tcp_wrbuffer_write functions incorrectly allocate with throttle=true, effectively making the IOB_THROTTLE option useless.

This patch modifies the calls in udp/tcp_wrbuffer_write to allocate unthrottled.

There were also several locations in the receive path that incorrectly allocated unthrottled.

Impact

This should not have an effect during normal operation. This change is only in effect when CONFIG_IOB_THROTTLE > 0 and there's high reception load. In that case the system should keep operating and not deadlock on a sendto() call on a blocking socket without timeout.

Testing

I ran iperf -s -B 10.0.1.2 -u & in the simulator target with the other end iperf -c -b 100M ... on the host machine while periodically executing cat /proc/iobinfo to check the state of the IOB MM. The sim target is compiled with CONFIG_IOB_THROTTLE=128

When compiled from master, I get (worst-case):

nsh> cat /proc/iobinfo   3.01-   6.02 sec   29767500 Bytes   79.12 Mbits/sec

    ntotal     nfree     nwait nthrottle
      1024         5         0         0
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
      1024         5         0         0
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
      1024         5         0         0
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
      1024         5         0         0

Here you can see that nfree < CONFIG_IOB_THROTTLE(=128) even though only the receive path is exercised.

After the changes the worst-case is:

nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
      1024      1024         0       896
nsh> cat /proc/iobinfo   3.01-   6.02 sec   26019000 Bytes   69.15 Mbits/sec

    ntotal     nfree     nwait nthrottle
      1024       134         0         6
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
      1024       134         0         6

So nfree > CONFIG_IOB_THROTTLE.

There are still several places in other drivers that don't respect the throttle parameter e.g. drivers/wireless/ieee802154/xbee/xbee.c:265

@github-actions github-actions bot added Area: Documentation Improvements or additions to documentation Area: Networking Effects networking subsystem Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: simulator Issues related to the SIMulator Arch: xtensa Issues related to the Xtensa architecture Area: USB Size: S The size of the change in this PR is small labels Nov 20, 2025
@fdcavalcanti
Copy link
Contributor

This looks very promising! Thanks @mennovf.
Sadly I won't be able to test this until next week :(

@zhhyu7
Copy link
Contributor

zhhyu7 commented Nov 20, 2025

In practical work scenarios, if we don't intentionally not read the packet, the IOB in readahead will always be consumed, and ioc_alloc_committed can ensure that the sending thread that was previously in a waiting state obtains the IOB. If the sending direction is unthrottled, during the TCP sending process, if the buffer in the write queue is full of IOB, the driver will be unable to receive any new packet (especially TCP-ACK), and the buffer in the write queue will never be released, causing the entire IP protocol stack to hang. This is a fatal problem. So for the overall robustness of the IP protocol stack, it is recommended to use throttled IOB for the sending direction and unthrottled IOB for the receiving direction. If you agree with my viewpoint, we should modify the description in kconfig. This is my understanding of this issue, which can be used as a reference for you.

@mennovf mennovf force-pushed the fix/net-wrbuffer-alloc-unthrottled branch from ae4686d to 8cbab7d Compare November 20, 2025 16:27
@mennovf
Copy link
Author

mennovf commented Nov 20, 2025

Yes that's the dual problem of what I'm experiencing. I think the essential issue is that neither write nor read should be able to starve the other's memory, so compiling with either NET_SEND_BUFSIZE or RECV_SEND_BUFSIZE too small to contain a packet is just not correct. Both the receive and transmit end should always have enough memory to hold at least one packet to ensure progress.
This issue is exacerbated with other net resources allocating from the same iob pool.

@linguini1
Copy link
Contributor

Does this close/resolve #17299 or is it just one piece of the puzzle?

@acassis
Copy link
Contributor

acassis commented Nov 21, 2025

@mennovf suggestion: since you faced these issues with net iob throttle, maybe you could include a small section at Documentation/components/net/netdriver.rst talking about this feature, to give an overview for someone willing to use it. And the implications do disabling it and/or read-ahead. Maybe add some testing examples at https://nuttx.apache.org/docs/latest/guides/testingtcpip.html

This is just a suggestion, because NuttX documentation is very shy! So all opportunities we have to improve it we need to use. :-)

…r sends,

throttle for receives.

The Kconfig option CONFIG_IOB_THROTTLE is used to limit the amount allocated
by TCP (and UDP) receives as to not starve sends. This distinction is made
via a 'throttle' argument passed to the iob_*alloc functions, which are wrapped
by net_iob*alloc. Previously the udp/tcp_wrbuffer_write functions incorrectly
allocate with throttle=true, effectively making the IOB_THROTTLE option useless.

This patch modifies the calls in udp/tcp_wrbuffer_write to allocate unthrottled,
and fixes an unthrottled allocation in the TCP receive path.

There were also several locations in the receive path that incorrectly
allocated unthrottled.

Signed-off-by: Menno Vanfrachem <mennovanfrachem@hotmail.com>

Modify receives to allocate throttled
@mennovf mennovf force-pushed the fix/net-wrbuffer-alloc-unthrottled branch from 8cbab7d to a0dc172 Compare November 21, 2025 12:33
@xiaoxiang781216
Copy link
Contributor

Yes that's the dual problem of what I'm experiencing. I think the essential issue is that neither write nor read should be able to starve the other's memory, so compiling with either NET_SEND_BUFSIZE or RECV_SEND_BUFSIZE too small to contain a packet is just not correct. Both the receive and transmit end should always have enough memory to hold at least one packet to ensure progress. This issue is exacerbated with other net resources allocating from the same iob pool.

@mennovf with your change how to handle the case described by @zhhyu7 :

If the sending direction is unthrottled, during the TCP sending process, if the buffer in the write queue is full of IOB, the driver will be unable to receive any new packet (especially TCP-ACK), and the buffer in the write queue will never be released, causing the entire IP protocol stack to hang. This is a fatal problem.

@xiaoxiang781216 xiaoxiang781216 linked an issue Nov 21, 2025 that may be closed by this pull request
1 task
@fdcavalcanti
Copy link
Contributor

fdcavalcanti commented Nov 25, 2025

Tried running the IPERF test and it works at first, but then it fails on iob_add_queue.c as shown below.

nsh> iperf -s &
iperf [9:100]
nsh>      IP: 192.168.0.127

 mode=tcp-server sip=192.168.0.127:5001,dip=0.0.0.0:5001, interval=3, time=0
accept: 192.168.0.125:39578

           Interval         Transfer         Bandwidth

   0.00-   3.01 sec    2874800 Bytes    7.64 Mbits/sec
   3.01-   6.02 sec    1865880 Bytes    4.96 Mbits/sec
   6.02-   9.03 sec    2638220 Bytes    7.01 Mbits/sec
   9.03-  12.04 sec    2486380 Bytes    6.61 Mbits/sec
dump_assert_info: Current Version: NuttX  10.4.0 a0dc172c62 Nov 25 2025 10:34:44 risc-v
dump_assert_info: Assertion failed iobq->qh_tail: at file: iob/iob_add_queue.c:73 task: wifi process: Kernel 0x4080924a

edit: it is also failing on master. I think it is still starving the IOBs, but I'm not sure.

@mennovf
Copy link
Author

mennovf commented Nov 27, 2025

@mennovf with your change how to handle the case described by @zhhyu7 :

If the sending direction is unthrottled, during the TCP sending process, if the buffer in the write queue is full of IOB, the driver will be unable to receive any new packet (especially TCP-ACK), and the buffer in the write queue will never be released, causing the entire IP protocol stack to hang. This is a fatal problem.

Yes, the TCP sending will be blocked. Note that this is already an issue if _IOB_THROTTLE=0, regardless whether sending or receiving is marked "throttled".
There is a more fundamental issue with the current networking stack: ideally each socket's send & receive end should allocate some minimum amount of memory on creation that's used to ensure forward progress.

@fdcavalcanti I can't reproduce it. What relevant options are you using?

I did seemingly run into another bug using iperf where the socket doesn't seem to get cleaned up properly on ctr-c.

@fdcavalcanti
Copy link
Contributor

@mennovf with your change how to handle the case described by @zhhyu7 :
If the sending direction is unthrottled, during the TCP sending process, if the buffer in the write queue is full of IOB, the driver will be unable to receive any new packet (especially TCP-ACK), and the buffer in the write queue will never be released, causing the entire IP protocol stack to hang. This is a fatal problem.

Yes, the TCP sending will be blocked. Note that this is already an issue if _IOB_THROTTLE=0, regardless whether sending or receiving is marked "throttled". As I said, this is a more fundamental issue with the current networking stack. Ideally each socket's send & receive end should allocate some minimum amount of memory on creation that's used to ensure forward progress.

@fdcavalcanti I can't reproduce it. What relevant options are you using?

I did seemingly run into another bug using iperf where the socket doesn't seem to get cleaned up properly on ctr-c.

I use the default defconfig on esp32c6-devkits:wifi.
Some relevant options are:

CONFIG_IOB_BUFSIZE=128
CONFIG_IOB_NBUFFERS=160
CONFIG_IOB_THROTTLE=24
CONFIG_NETUTILS_IPERF=y
CONFIG_NET_BROADCAST=y
CONFIG_NET_ETH_PKTSIZE=1514
CONFIG_NET_ICMP_SOCKET=y
CONFIG_NET_TCP=y
CONFIG_NET_TCP_DELAYED_ACK=y
CONFIG_NET_TCP_KEEPALIVE=y
CONFIG_NET_TCP_WRITE_BUFFERS=y
CONFIG_NET_UDP=y

@sastel
Copy link

sastel commented Dec 17, 2025

I've seen this IOB deadlock issue too. I've been working around it by simply increasing the number of buffers available, but that isn't a good solution. Are you guys planning to move this pull request ahead, or are you parking it for now?

@mennovf
Copy link
Author

mennovf commented Dec 17, 2025

I think the THROTTLE option is a dead-end due to the TCP remarks. IMO the memory management needs an overhaul. In the meantime I can resolve my issue by fixing RECV/SEND_BUFSIZE in a fork and correctly configuring the system with these parameters.

@fdcavalcanti
Copy link
Contributor

I think the THROTTLE option is a dead-end due to the TCP remarks. IMO the memory management needs an overhaul. In the meantime I can resolve my issue by fixing RECV/SEND_BUFSIZE in a fork and correctly configuring the system with these parameters.

Can you show an example of this approach?

@PetervdPerk-NXP
Copy link
Contributor

I recently upgraded from NuttX 10.3.0 to NuttX 12.12.0 and encountered this issue, which wasn’t a problem before. Previously, I was running with CONFIG_IOB_THROTTLE=0 without any issues.

@xiaoxiang781216 @acassis Could you clarify why the network stack appears to have regressed to the point where deadlocks are possible? This is quite concerning.

@azerupi
Copy link
Contributor

azerupi commented Jan 8, 2026

@PetervdPerk-NXP this was already an issue with NuttX 10.3.0 but it seems like it only manifest under certain conditions / network pressure. This ticket & PR were opened by @mennovf because we are hitting this issue with current PX4 for our custom board. See PX4/PX4-Autopilot#25956

NuttX 12.12 might make the conditions to trigger this easier, but it was definitely an issue before.

@PetervdPerk-NXP
Copy link
Contributor

PetervdPerk-NXP commented Jan 8, 2026

@PetervdPerk-NXP this was already an issue with NuttX 10.3.0 but it seems like it only manifest under certain conditions / network pressure. This ticket & PR were opened by @mennovf because we are hitting this issue with current PX4 for our custom board. See PX4/PX4-Autopilot#25956

NuttX 12.12 might make the conditions to trigger this easier, but it was definitely an issue before.

Surely but with 10.3.0 atleast IOB could get full and not deadlock this easily, right now the first occurence of a full with IOB with TCP traffic yields a deadlock for me, This is most likely more logical/state machine problem.

NuttX 10.3.0 working fine with this config

CONFIG_IOB_NBUFFERS=24
CONFIG_IOB_BUFSIZE=196
CONFIG_IOB_THROTTLE=0

NuttX 12.12.0 config needed to avoid deadlocks

CONFIG_IOB_NBUFFERS=256
CONFIG_IOB_BUFSIZE=256
CONFIG_IOB_THROTTLE=0

Overall, the throughput appears quite inconsistent, even though the CPU load remains relatively low at around 6% from the thread generating the TCP data. I understand that @xiaoxiang781216 and his team have been working on rewriting the NET/IOB stack for several years, but I was expecting performance improvements rather than a regression.

Edit: NuttX 12.12.0 default settings change seems to make it easier to reproduce.
By default the WRBCHAINS are dynamically allocated/freed which is quite costly and causes these latency spikes.
Setting it to the settings below then network behavior gets closer NuttX to 10.3.0, nevertheless once IOB gets full the network dies, which was still less likely to happen on NuttX 10.3.0

CONFIG_NET_IPFORWARD_ALLOC_STRUCT=0
CONFIG_NET_TCP_ALLOC_WRBCHAINS=0
CONFIG_NET_UDP_ALLOC_WRBCHAINS=0

@xiaoxiang781216
Copy link
Contributor

I recently upgraded from NuttX 10.3.0 to NuttX 12.12.0 and encountered this issue, which wasn’t a problem before. Previously, I was running with CONFIG_IOB_THROTTLE=0 without any issues.

@xiaoxiang781216 @acassis Could you clarify why the network stack appears to have regressed to the point where deadlocks are possible? This is quite concerning.

could you share the hardware/defconfig and repro step? if we have the hardware, @zhhyu7 could help to identify the root cause.

@zhhyu7
Copy link
Contributor

zhhyu7 commented Jan 9, 2026

Surely but with 10.3.0 atleast IOB could get full and not deadlock this easily, right now the first occurence of a full with IOB with TCP traffic yields a deadlock for me, This is most likely more logical/state machine problem.

Hi @PetervdPerk-NXP , The issue of the protocol stack getting stuck, caused by the case where TCP cannot process TCP_ACK because the transmit queue fills up the IOB so the driver fails to allocate an IOB, which in turn leads to the inability to release TCP write buffer resources, should have always existed, especially in scenarios where the total amount of iob is small, such as less than 16k. Another scenario that can cause the protocol stack getting stuck is when the application never reads the packets in the protocol stack's readahead. The probability of the second scenario occurring can be reduced by limiting CONFIG_NET_RECV_BUFSIZE.

Let's focus on the first scenario. To avoid the occurrence of this scenario.
If it is a scenario where CONFIG_IOB_THROTTLE=0, then CONFIG_NET_SEND_BUFSIZE needs to be restricted to be less than CONFIG_IOB_NBUFFERS*CONFIG_IOB_BUFSIZE / The number of TCP sockets that the business may create, can largely alleviate this problem, but since IOB may not always fully utilize all buffers, there is still a certain risk.
If it is a scenario where CONFIG_IOB_THROTTLE>0, we'd better set IOB in the TX direction to throttle=true and in the RX direction to throttle=false, which is not quite consistent with the current allocation logic. If this suggestion is approved, a patch needs to be submitted to modify the IOB allocation logic accordingly. In this case, there is no need to impose overly strict restrictions on CONFIG_NET_SEND_BUFSIZE, but scenarios where the application does not read readahead for a long time still need to be avoided. Therefore, it is best to impose appropriate restrictions on CONFIG_NET_RECV_BUFSIZE, which should completely avoid the issue of the protocol stack getting stuck.

Optimization of performance requires further specialized analysis and solution design.

On different products, it is necessary to make reasonable configurations for CONFIG_IOB_NBUFFERS, CONFIG_IOB_BUFSIZE, CONFIG_NET_RECV_BUFSIZE, CONFIG_NET_RECV_BUFSIZE, CONFIG_IOB_THROTTLE, etc., in combination with the memory allocation situation to make the protocol stack work more efficiently and robustly.

@zhhyu7
Copy link
Contributor

zhhyu7 commented Jan 19, 2026

Surely but with 10.3.0 atleast IOB could get full and not deadlock this easily, right now the first occurence of a full with IOB with TCP traffic yields a deadlock for me, This is most likely more logical/state machine problem.

Hi @PetervdPerk-NXP , The issue of the protocol stack getting stuck, caused by the case where TCP cannot process TCP_ACK because the transmit queue fills up the IOB so the driver fails to allocate an IOB, which in turn leads to the inability to release TCP write buffer resources, should have always existed, especially in scenarios where the total amount of iob is small, such as less than 16k. Another scenario that can cause the protocol stack getting stuck is when the application never reads the packets in the protocol stack's readahead. The probability of the second scenario occurring can be reduced by limiting CONFIG_NET_RECV_BUFSIZE.

Let's focus on the first scenario. To avoid the occurrence of this scenario. If it is a scenario where CONFIG_IOB_THROTTLE=0, then CONFIG_NET_SEND_BUFSIZE needs to be restricted to be less than CONFIG_IOB_NBUFFERS*CONFIG_IOB_BUFSIZE / The number of TCP sockets that the business may create, can largely alleviate this problem, but since IOB may not always fully utilize all buffers, there is still a certain risk. If it is a scenario where CONFIG_IOB_THROTTLE>0, we'd better set IOB in the TX direction to throttle=true and in the RX direction to throttle=false, which is not quite consistent with the current allocation logic. If this suggestion is approved, a patch needs to be submitted to modify the IOB allocation logic accordingly. In this case, there is no need to impose overly strict restrictions on CONFIG_NET_SEND_BUFSIZE, but scenarios where the application does not read readahead for a long time still need to be avoided. Therefore, it is best to impose appropriate restrictions on CONFIG_NET_RECV_BUFSIZE, which should completely avoid the issue of the protocol stack getting stuck.

Optimization of performance requires further specialized analysis and solution design.

On different products, it is necessary to make reasonable configurations for CONFIG_IOB_NBUFFERS, CONFIG_IOB_BUFSIZE, CONFIG_NET_RECV_BUFSIZE, CONFIG_NET_RECV_BUFSIZE, CONFIG_IOB_THROTTLE, etc., in combination with the memory allocation situation to make the protocol stack work more efficiently and robustly.

I push an enhanced patch for this scenario #18011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: simulator Issues related to the SIMulator Arch: xtensa Issues related to the Xtensa architecture Area: Documentation Improvements or additions to documentation Area: Networking Effects networking subsystem Area: USB Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HELP] CONFIG_IOB_THROTTLE does not do what it claims

9 participants