A guest which chooses to spin up many ill-fated TCP connection attempts (#985, #990) runs into another bottleneck, that being the RouteCache which sits between OPTE and the underlay network. This table is small (sized 512 entries), and while it does have an eviction policy what we've seen in practice is that this traffic pattern causes contention and thrashing when all of these packets invariably miss on the table, find it full, and evict a possibly-good entry.
Copying out from https://github.com/oxidecomputer/customer-support/issues/1188#issuecomment-4497160655, we need to:
- Bump the table size,
- Disable eviction and just go straight to a raw lookup when table capacity is exceeded,
- Shrink the write lock scope to cover only insertion, such that readers aren't blocked while one thread is querying illumos itself.
Combined, these should ensure that actually-established flows eventually end up represented in the cache, and that we can sustain a higher active connections-per-second.
#539 will properly solve this issue and give us a far smaller routing table (because it will not scale with the number of active connections), but will require additional interfaces in the kernel to realise. The path described here is a shorter-term fix while we determine what those interfaces should look like.
A guest which chooses to spin up many ill-fated TCP connection attempts (#985, #990) runs into another bottleneck, that being the
RouteCachewhich sits between OPTE and the underlay network. This table is small (sized 512 entries), and while it does have an eviction policy what we've seen in practice is that this traffic pattern causes contention and thrashing when all of these packets invariably miss on the table, find it full, and evict a possibly-good entry.Copying out from https://github.com/oxidecomputer/customer-support/issues/1188#issuecomment-4497160655, we need to:
Combined, these should ensure that actually-established flows eventually end up represented in the cache, and that we can sustain a higher active connections-per-second.
#539 will properly solve this issue and give us a far smaller routing table (because it will not scale with the number of active connections), but will require additional interfaces in the kernel to realise. The path described here is a shorter-term fix while we determine what those interfaces should look like.