(Improvements) Rune related and span-ish optimizations by pl752 · Pull Request #1247 · FirebirdSQL/NETProvider

pl752 · 2025-12-11T07:40:45Z

I want to propose set of changes aimed at improving performance, which I have implemented and used for some time in my (private) projects.
The main goal of these changes is to significantly reduce allocations to heap by using stack allocations, array pool and avoiding unnecessary allocations in first place.
I have created topic in mailing list
I will appreciate opinions and help with testing, as I was used these changes for a while without any anomalies, though I didn't run thorough tests with all versions (I am using fb 3 server). Also the changes shouldn't have changed observable behavior.

ations and optimized rune operations

niekschoemaker · 2025-12-12T00:02:07Z

Personally most of the changes seem to make sense, but I would make the case that the Auth part does become way to complex with these changes (and also not sure how often that code even runs, cause I suppose it runs once per connection so probably not too hot of a path)

The other parts do seem to make sense, especially the ReaderWriter optimizations, as those run for each query.

Did you however happen to run the benchmarks against this to see what actual change it makes to performance?

pl752 · 2025-12-12T06:15:04Z

Unfortunately, I haven't got to running benchmarks yet, however changes resulted in significant reduction of cpu time usage and allocations in application performance profiling runs, I will try to perform more thorough benchmarks and correctness tests soon, when I will have some free time

pl752 · 2025-12-12T06:20:58Z

Also I agree that auth part is a case of over-optimization and can be omitted. I just applied change pattern to everything which allocates temporary buffers and I have got an eye on. So optimizations for things which run once per session/connection aren't necessary

pl752 · 2025-12-12T07:05:16Z

Upd: I have run the Perf thing I found in a solution (idk if it is any representative) And yeah, the speed difference is pretty negligible, however reduction in allocations can be clearly observed

Perf benchmarks

BenchmarkDotNet v0.15.8, Windows 10 (10.0.19044.6691/21H2/November2021Update)
AMD Ryzen 7 5800H with Radeon Graphics 3.20GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 10.0.101
  [Host]  : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3
  NuGet   : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3
  Project : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3

Jit=RyuJit  Platform=X64  Toolchain=.NET 8.0
WarmupCount=3

| Method  | Job     | BuildConfiguration | DataType             | Count | Mean        | Error     | StdDev    | Ratio | Gen0    | Allocated | Alloc Ratio |
|-------- |-------- |------------------- |--------------------- |------ |------------:|----------:|----------:|------:|--------:|----------:|------------:|
| Execute | NuGet   | ReleaseNuGet       | bigint               | 100   | 20,322.3 us | 212.53 us | 188.40 us |  1.00 | 31.2500 |  307.4 KB |        1.00 |
| Execute | Project | Release            | bigint               | 100   | 20,160.8 us | 175.47 us | 146.52 us |  0.99 |       - | 237.61 KB |        0.77 |
|         |         |                    |                      |       |             |           |           |       |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | bigint               | 100   |    482.7 us |   4.17 us |   3.90 us |  1.00 |  6.8359 |  56.64 KB |        1.00 |
| Fetch   | Project | Release            | bigint               | 100   |    484.2 us |   3.33 us |   2.78 us |  1.00 |  4.8828 |  40.35 KB |        0.71 |
|         |         |                    |                      |       |             |           |           |       |         |           |             |
| Execute | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   | 20,406.9 us | 217.86 us | 193.12 us |  1.00 | 31.2500 | 311.34 KB |        1.00 |
| Execute | Project | Release            | varch(...) utf8 [30] | 100   | 20,251.5 us | 118.63 us | 110.97 us |  0.99 |       - | 238.43 KB |        0.77 |
|         |         |                    |                      |       |             |           |           |       |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   |    490.7 us |   3.71 us |   3.47 us |  1.00 |  6.8359 |  60.51 KB |        1.00 |
| Fetch   | Project | Release            | varch(...) utf8 [30] | 100   |    494.8 us |   6.60 us |   5.85 us |  1.01 |  4.8828 |   41.1 KB |        0.68 |

// * Hints *
Outliers
  CommandBenchmark.Execute: NuGet   -> 1 outlier  was  removed (21.28 ms)
  CommandBenchmark.Execute: Project -> 2 outliers were removed (20.81 ms, 21.15 ms)
  CommandBenchmark.Fetch: Project   -> 2 outliers were removed (499.44 us, 507.61 us)
  CommandBenchmark.Execute: NuGet   -> 1 outlier  was  removed (21.71 ms)
  CommandBenchmark.Fetch: Project   -> 1 outlier  was  removed (528.00 us)

Also firebird 3 is used, disk used is OEM samsung nvme 2tb (pm9a1, aka oem 980 pro), 32gb of ddr4 ram @3200MT JEDEC, dual channel ofc

…ed static, breaking tests)

pl752 · 2025-12-12T09:17:49Z

Upd2: Ran tests with firebird 3 (no embedded), so it does need further testing with other versions (especially embedded and batch operations in modern fb), there was an issue with boolean reading due to _smallbuffer being used both for reading useful bytes and pad (which doesn't affect types which don't get padded). Also, small test run time reduction was observed (aka 24.1 -> 23.5 mins, but without repeatability checks) and no changes in pass/failed/skipped numbers were noticed (after the fix)

pl752 · 2025-12-12T09:53:49Z

Upd3: performed tests with embedded engine, all passed

pl752 · 2025-12-12T15:43:57Z

Upd4:
TLDR: Written some benchmarks specific to my (unfortunately private) solution's queries. Changes in query execution timing sometimes is hard to register due to fb3 engine being the main bottleneck in testing scenarios even in ideal conditions (localhost with fast cpu and nvme), however, it seems that query creation/preparation benefited significantly and also massive boost observed in string operations due to rune conversion rework and also positive side effects in memory and local cpu time utilization can be observed.

Practical benchmark results

//Update multiple: Optimized (local_opt2)
| Method                                      | UpdateRows | Mean        | Error     | StdDev    | Gen0    | Allocated |
|-------------------------------------------- |----------- |------------:|----------:|----------:|--------:|----------:|
| Update_MainDeliveryById_Merge_RollbackAsync | 25         |  1,695.5 us |  33.42 us |  39.78 us |  3.9063 |  42.78 KB |
| Update_MainDeliveryById_Merge_RollbackAsync | 1000       | 42,000.3 us | 481.56 us | 426.89 us | 83.3333 | 867.56 KB |


//Update multiple: Original (master)
| Update_MainDeliveryById_Merge_RollbackAsync | 25         |  1,704.9 us |  33.17 us |  52.62 us |  3.9063 |  46.98 KB |
| Update_MainDeliveryById_Merge_RollbackAsync | 1000       | 42,416.1 us | 634.02 us | 593.06 us | 83.3333 |  985.1 KB |


//Single insert/upsert: Optimized
| Select_LoadWBSellerAccountsAsync            | -         |    717.2 us |  14.08 us |  14.46 us |  1.9531 |  29.26 KB |
| Insert_Upsert_WbDocCache_RollbackAsync      | -         |    708.6 us |  12.61 us |  11.18 us |  3.9063 |  33.58 KB |


//Single insert/upsert: Original
| Select_LoadWBSellerAccountsAsync            | -         |    741.5 us |  14.50 us |  18.86 us |  3.9063 |   33.5 KB |
| Insert_Upsert_WbDocCache_RollbackAsync      | -         |    724.0 us |  13.94 us |  18.13 us |  3.9063 |  37.22 KB |


//Select multiple mixed (3 int, 1 literal char string): Optimized
| Method                              | Rows   | Mean           | Error        | StdDev        | Gen0       | Gen1      | Allocated    |
|------------------------------------ |------- |---------------:|-------------:|--------------:|-----------:|----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |       741.1 us |     19.82 us |      57.83 us |          - |         - |     47.52 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     3,507.2 us |     66.56 us |      81.74 us |          - |         - |     421.2 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    31,362.5 us |  5,421.76 us |  15,986.17 us |          - |         - |   4078.15 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   321,426.4 us | 17,596.63 us |  51,884.06 us |  4000.0000 |         - |  40710.48 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,394,208.6 us | 67,371.50 us | 193,301.47 us | 49000.0000 | 9000.0000 | 407078.14 KB |


//Select multiple mixed (3 int, 1 literal char string): Original 
	(Yes, 1.09 to >3x in speed and 10x in allocation volumes 
	and when profiling, actually, ~100x difference in allocate/free event counters)
| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.611 ms |  0.0506 ms |   0.1453 ms |     1.604 ms |           - |           - |     457.27 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |    11.138 ms |  0.1882 ms |   0.1760 ms |    11.129 ms |           - |           - |    4511.55 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    34.017 ms |  4.9421 ms |  14.4162 ms |    25.360 ms |   5000.0000 |   1000.0000 |   44988.63 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   346.085 ms | 23.2037 ms |  68.4167 ms |   337.300 ms |  55000.0000 |  11000.0000 |  449544.78 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,709.593 ms | 73.9280 ms | 194.7560 ms | 3,695.932 ms | 550000.0000 | 110000.0000 | 4494710.22 KB |

//Select multiple int only (3 int): Optimized
| Method                              | Rows    | Mean           | Error       | StdDev      | Gen0       | Gen1      | Allocated    |
|------------------------------------ |-------- |---------------:|------------:|------------:|-----------:|----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10      |       376.7 us |    17.69 us |    50.19 us |          - |         - |     11.73 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100     |     1,102.8 us |    54.87 us |   160.06 us |          - |         - |     63.99 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000    |     4,537.8 us |   689.68 us | 2,033.53 us |          - |         - |    497.61 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000   |    18,131.8 us |   137.98 us |   115.22 us |          - |         - |   4927.88 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000  |   176,431.1 us |   956.29 us |   798.54 us |  6000.0000 |         - |  49230.36 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000000 | 1,743,465.8 us | 7,326.00 us | 6,494.31 us | 60000.0000 | 6000.0000 | 497846.96 KB |

//Select multiple int only (3 int): Original
| Method                              | Rows    | Mean           | Error       | StdDev      | Median         | Gen0       | Gen1      | Allocated    |
|------------------------------------ |-------- |---------------:|------------:|------------:|---------------:|-----------:|----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10      |       357.9 us |     9.24 us |    25.44 us |       355.4 us |          - |         - |     12.89 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100     |     1,182.7 us |    49.32 us |   142.29 us |     1,158.6 us |          - |         - |     70.78 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000    |     4,541.1 us |   718.80 us | 2,119.41 us |     3,485.7 us |          - |         - |    561.27 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000   |    18,280.0 us |   340.21 us |   454.17 us |    18,246.9 us |          - |         - |   5561.08 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000  |   173,885.9 us |   916.01 us |   764.91 us |   173,896.6 us |  6000.0000 |         - |  55558.87 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000000 | 1,745,630.3 us | 4,109.23 us | 3,642.73 us | 1,745,432.2 us | 68000.0000 | 7000.0000 | 561128.53 KB |

It was a little bit tricky to actually obtain measurements which could show improvements, however some interesting observations can be made.
The main explaination of smallness of timing improvements is that despite my benchmarks doing pretty much nothing aside from opening connection, opening configured transaction, creating queries, filling in parameters, preparing if ran multiple times in a row, executing/reading, mapping selected fields to single instance of structure (to avoid performance noise as much as possible), rolling back the transaction and closing the connection; the db engine seems to use the whole cpu core time, while the application thread is slacking most of the time.
However the string reading benefited heavily due to optimizations which reduced overall allocated object number 10-100x, because of the original rune char enumerator, which allocated every (!) rune as a separate char array resulting in tens of millions char[1] and char[2] objects being allocated and then collected shortly after, while the new methods avoid allocation as much as possible, situation is also worsened by the original rune counting method, which just called full enumeration, creating all the char arrays and then simply counted them while never using char data itself. Reducing allocations to the definitive buffers and strings save a lot of cpu time (as the heap allocation even in dotnet is not cheap operation and during string conversions the client library actually becomes the bottleneck instead of the engine).
Also the 10x memory volume difference when working with strings can be observed due to the char[1]/[2] arrays being not only 2-4 bytes of useful raw data, but also 0-6 bytes of padding (in some cases) and 8-16 bytes of meta array object (containing effectively a Span, aka pointer to real data and length of array), and that's not taking into account object type and reference manager related data.
Also the tests of queries of small volumes of rows usually yielded bigger percentage improvements (1 to 4% and 9 to 100+%) as, I think, that better string processing aided query and parameter preparation phase.
Also the timings are not the whole story, as the changes caused some pretty benefitial side effects: reduced amount of allocations ofc. reduce amount of times GC is called, also stackalloc is free (cause it is not the complex allocator function, but rather a tiny sub esp, size ... add esp, size), and also there is a reduction of cpu time used, observable even without the profiler, as I could clearly see main thread being 2-3% (5-6% during select with char) of whole cpu, while optimized version consumed only 1-3%, which means that on low-end client systems or in situation when the application is heavily uses the thread pool, the db reading task will occupy the thread less, thus providing more time for other tasks, when pool is exhausted and queue is used, and for other programs on low-end or heavily loaded machines, in theory.
Also the lack of proper benchmark/test coverage was due to the rework being small experiment out of curiosity, when I noticed, that firebird was top 1-2 consumer of cpu time in my application, but then I decided that experiment was pretty successful and the contribution might be useful for other developers and their solutions, so I decided reaching out with the proposal.

…fixed size 16 byte arrays

pl752 · 2025-12-14T08:26:59Z

Upd5:
TLDR: Implemented set of synthetic benchmarks to test the main changes themselves (isolated from IO) along with some alternative implementations of read/write methods. Huge improvements in rune processing methods. Small performance gains and allocation reduction in synchronous methods, some performance degradation (which shouldn't be noticeable or measurable on the scale of the whole db query/action) in async writing methods for small data types and allocation reduction for most of the async operations. Significant improvements for large string operations.

Note: in the free time I have further tweaked XdrReaderWriter's buffer handling and simplified the code of AuthBlock

Rune op benchmarks

// * Summary *

BenchmarkDotNet v0.15.8, Windows 10 (10.0.19044.6691/21H2/November2021Update)
AMD Ryzen 7 5800H with Radeon Graphics 3.20GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 10.0.101
  [Host]     : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3
  Job-JMDAGQ : .NET 10.0.1 (10.0.1, 10.0.125.57005), X64 RyuJIT x86-64-v3
  Job-OOTPKI : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3


| Method                                       | Job        | Toolchain | Kind                 | RuneLength | MaxRuneCount | Mean        | Error     | StdDev    | Gen0   | Allocated |
|--------------------------------------------- |----------- |---------- |--------------------- |----------- |------------- |------------:|----------:|----------:|-------:|----------:|
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |  2,490.0 ns |  39.22 ns |  36.69 ns | 1.0414 |    8712 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |    110.8 ns |   2.15 ns |   1.91 ns | 0.0334 |     280 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |  3,002.9 ns |  54.82 ns |  51.27 ns | 1.0414 |    8712 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |    108.2 ns |   1.94 ns |   1.81 ns | 0.0334 |     280 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |  9,842.2 ns | 196.62 ns | 201.91 ns | 4.0588 |   34056 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |    376.1 ns |   4.77 ns |   4.46 ns | 0.1249 |    1048 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          | 11,843.6 ns | 178.52 ns | 166.99 ns | 4.0588 |   34056 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          |    423.3 ns |   7.58 ns |   6.72 ns | 0.1249 |    1048 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |  2,749.5 ns |  53.15 ns |  49.72 ns | 1.0490 |    8776 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |    133.1 ns |   2.38 ns |   2.11 ns | 0.0410 |     344 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |  3,315.5 ns |  61.21 ns |  57.26 ns | 1.0490 |    8776 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |    126.6 ns |   1.03 ns |   0.91 ns | 0.0410 |     344 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 10,957.8 ns | 157.33 ns | 139.47 ns | 4.0894 |   34312 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          |    453.9 ns |   8.40 ns |   7.86 ns | 0.1554 |    1304 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 12,647.8 ns | 113.42 ns | 100.54 ns | 4.0894 |   34312 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          |    449.2 ns |   5.65 ns |   5.01 ns | 0.1554 |    1304 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |  3,225.2 ns |  28.44 ns |  25.21 ns | 1.0681 |    8936 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |    165.1 ns |   3.34 ns |   5.67 ns | 0.0601 |     504 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |  3,867.0 ns |  40.20 ns |  37.61 ns | 1.0681 |    8936 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |    161.0 ns |   3.25 ns |   7.84 ns | 0.0601 |     504 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 13,368.1 ns | 239.05 ns | 211.91 ns | 4.1656 |   34952 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          |    584.4 ns |  10.15 ns |   9.00 ns | 0.2317 |    1944 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 15,874.9 ns | 213.65 ns | 199.85 ns | 4.1656 |   34952 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          |    592.7 ns |  11.31 ns |  10.58 ns | 0.2317 |    1944 B |
  
| Method                                   | Job        | Toolchain | Kind                 | RuneLength | Mean         | Error        | StdDev       | Median       | Gen0    | Allocated |
|----------------------------------------- |----------- |---------- |--------------------- |----------- |-------------:|-------------:|-------------:|-------------:|--------:|----------:|
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |    619.63 ns |    11.920 ns |    11.707 ns |    620.95 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |     64.97 ns |     0.332 ns |     0.294 ns |     64.97 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |    754.45 ns |    14.865 ns |    31.678 ns |    745.93 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |     66.31 ns |     0.209 ns |     0.196 ns |     66.30 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       | 35,601.63 ns |   693.533 ns | 1,886.808 ns | 34,675.31 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       |  3,909.67 ns |    23.210 ns |    21.711 ns |  3,901.30 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       | 46,619.68 ns |   920.902 ns | 1,752.112 ns | 46,408.94 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       |  3,924.67 ns |    15.360 ns |    14.368 ns |  3,921.15 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    649.58 ns |    11.920 ns |    11.150 ns |    651.50 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |     74.12 ns |     1.125 ns |     0.939 ns |     74.00 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    745.95 ns |    14.848 ns |    14.583 ns |    749.10 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |     81.77 ns |     0.479 ns |     0.374 ns |     81.83 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 38,241.41 ns |   592.292 ns |   462.423 ns | 38,313.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       |  4,373.03 ns |    11.984 ns |    10.623 ns |  4,372.00 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 45,407.27 ns |   876.842 ns | 1,140.142 ns | 45,305.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       |  4,841.84 ns |    13.875 ns |    12.300 ns |  4,844.33 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    715.61 ns |    13.934 ns |    12.352 ns |    718.27 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |     92.80 ns |     0.554 ns |     0.492 ns |     92.63 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |    814.53 ns |    16.191 ns |    31.579 ns |    796.44 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |     97.37 ns |     0.291 ns |     0.258 ns |     97.33 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 43,324.08 ns |   835.526 ns |   781.551 ns | 43,488.72 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       |  5,585.16 ns |    25.642 ns |    21.412 ns |  5,584.12 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 50,161.87 ns | 1,000.371 ns | 1,466.331 ns | 50,378.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       |  5,831.10 ns |    30.611 ns |    27.135 ns |  5,827.66 ns |       - |         - |

Ofc. there are order of magnitude improvements when going from per rune allocations to allocation-free (aside from the instantiation of string from truncated span) methods, as the allocation and garbage collection of large amount of small objects is quite expensive.

Benchmark of buffer handling in XdrReaderWriter

| Method                                          | Job        | Toolchain | Mean       | Error     | StdDev    | Gen0   | Allocated |
|------------------------------------------------ |----------- |---------- |-----------:|----------:|----------:|-------:|----------:|
| 'master ReadBoolean()'                          | Job-JMDAGQ | .NET 10.0 |  1.8462 ns | 0.0212 ns | 0.0198 ns |      - |         - |
| 'local_opt2 ReadBoolean() (shared)'             | Job-JMDAGQ | .NET 10.0 |  0.8733 ns | 0.0127 ns | 0.0119 ns |      - |         - |
| 'local_opt2 ReadBoolean() (stackalloc)'         | Job-JMDAGQ | .NET 10.0 |  0.5557 ns | 0.0257 ns | 0.0228 ns |      - |         - |
| 'local_opt2 ReadBoolean() (rent always)'        | Job-JMDAGQ | .NET 10.0 |  9.2055 ns | 0.0498 ns | 0.0466 ns |      - |         - |
| 'local_opt2 ReadBoolean() (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 |  0.7566 ns | 0.0080 ns | 0.0071 ns |      - |         - |
| 'local_opt2 ReadBoolean() (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | 10.3027 ns | 0.0861 ns | 0.0719 ns |      - |         - |
| 'master ReadBoolean()'                          | Job-OOTPKI | .NET 8.0  |  5.2645 ns | 0.1155 ns | 0.1330 ns | 0.0038 |      32 B |
| 'local_opt2 ReadBoolean() (shared)'             | Job-OOTPKI | .NET 8.0  |  0.9108 ns | 0.0294 ns | 0.0260 ns |      - |         - |
| 'local_opt2 ReadBoolean() (stackalloc)'         | Job-OOTPKI | .NET 8.0  |  0.6007 ns | 0.0082 ns | 0.0077 ns |      - |         - |
| 'local_opt2 ReadBoolean() (rent always)'        | Job-OOTPKI | .NET 8.0  | 21.8549 ns | 0.1213 ns | 0.1134 ns |      - |         - |
| 'local_opt2 ReadBoolean() (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  |  0.5880 ns | 0.0125 ns | 0.0116 ns |      - |         - |
| 'local_opt2 ReadBoolean() (rent always, clear)' | Job-OOTPKI | .NET 8.0  | 20.7321 ns | 0.0914 ns | 0.0810 ns |      - |         - |

| Method                                                    | Job        | Toolchain | Mean       | Error     | StdDev    | Allocated |
|---------------------------------------------------------- |----------- |---------- |-----------:|----------:|----------:|----------:|
| 'master ReadInt32() using shared _smallBuffer'            | Job-JMDAGQ | .NET 10.0 |  1.0302 ns | 0.0362 ns | 0.0339 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc)'                     | Job-JMDAGQ | .NET 10.0 |  0.5182 ns | 0.0056 ns | 0.0050 ns |         - |
| 'local_opt2 ReadInt32() renting buffer'                   | Job-JMDAGQ | .NET 10.0 |  9.5122 ns | 0.0642 ns | 0.0536 ns |         - |
| 'local_opt2 ReadInt32() using shared _smallBuffer, clear' | Job-JMDAGQ | .NET 10.0 |  3.0379 ns | 0.0208 ns | 0.0195 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc, clear)'              | Job-JMDAGQ | .NET 10.0 |  0.5417 ns | 0.0145 ns | 0.0128 ns |         - |
| 'local_opt2 ReadInt32() renting buffer, clear'            | Job-JMDAGQ | .NET 10.0 | 10.3372 ns | 0.0458 ns | 0.0406 ns |         - |
| 'master ReadInt32() using shared _smallBuffer'            | Job-OOTPKI | .NET 8.0  |  0.9552 ns | 0.0122 ns | 0.0114 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc)'                     | Job-OOTPKI | .NET 8.0  |  0.5161 ns | 0.0083 ns | 0.0074 ns |         - |
| 'local_opt2 ReadInt32() renting buffer'                   | Job-OOTPKI | .NET 8.0  | 24.9393 ns | 0.0835 ns | 0.0781 ns |         - |
| 'local_opt2 ReadInt32() using shared _smallBuffer, clear' | Job-OOTPKI | .NET 8.0  |  4.5371 ns | 0.0779 ns | 0.0691 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc, clear)'              | Job-OOTPKI | .NET 8.0  |  0.7336 ns | 0.0082 ns | 0.0077 ns |         - |
| 'local_opt2 ReadInt32() renting buffer, clear'            | Job-OOTPKI | .NET 8.0  | 21.4044 ns | 0.1785 ns | 0.1490 ns |         - |

| 'local_opt2 ReadInt32() using shared _smallBuffer, clear with AsSpan' | Job-JMDAGQ | .NET 10.0 | 1.113 ns | 0.0208 ns | 0.0194 ns |         - |
| 'local_opt2 ReadInt32() using shared _smallBuffer, clear with AsSpan' | Job-OOTPKI | .NET 8.0  | 1.131 ns | 0.0215 ns | 0.0202 ns |         - |

| Method                                                  | Job        | Toolchain | Varying | Mean      | Error     | StdDev    | Gen0   | Allocated |
|-------------------------------------------------------- |----------- |---------- |-------- |----------:|----------:|----------:|-------:|----------:|
| 'master ReadGuid(int)'                                  | Job-JMDAGQ | .NET 10.0 | False   | 10.333 ns | 0.1596 ns | 0.1333 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int)'                              | Job-JMDAGQ | .NET 10.0 | False   |  5.502 ns | 0.0196 ns | 0.0183 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer)'        | Job-JMDAGQ | .NET 10.0 | False   |  7.343 ns | 0.0338 ns | 0.0316 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer, clear)' | Job-JMDAGQ | .NET 10.0 | False   |  9.989 ns | 0.0497 ns | 0.0440 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (rent always)'                | Job-JMDAGQ | .NET 10.0 | False   | 14.747 ns | 0.0621 ns | 0.0581 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (stackalloc, clear)'          | Job-JMDAGQ | .NET 10.0 | False   |  6.005 ns | 0.0324 ns | 0.0303 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (rent always, clear)'         | Job-JMDAGQ | .NET 10.0 | False   | 15.693 ns | 0.0667 ns | 0.0591 ns |      - |         - |
| 'master ReadGuid(int)'                                  | Job-OOTPKI | .NET 8.0  | False   | 10.727 ns | 0.1286 ns | 0.1140 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int)'                              | Job-OOTPKI | .NET 8.0  | False   |  5.616 ns | 0.0167 ns | 0.0139 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer)'        | Job-OOTPKI | .NET 8.0  | False   |  7.917 ns | 0.0312 ns | 0.0292 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer, clear)' | Job-OOTPKI | .NET 8.0  | False   |  8.587 ns | 0.0666 ns | 0.0556 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (rent always)'                | Job-OOTPKI | .NET 8.0  | False   | 25.562 ns | 0.0598 ns | 0.0530 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (stackalloc, clear)'          | Job-OOTPKI | .NET 8.0  | False   |  6.104 ns | 0.0295 ns | 0.0276 ns |      - |         - |
| 'local_opt2 ReadGuid(int) (rent always, clear)'         | Job-OOTPKI | .NET 8.0  | False   | 28.966 ns | 0.1026 ns | 0.0960 ns |      - |         - |
| 'master ReadGuid(int)'                                  | Job-JMDAGQ | .NET 10.0 | True    | 14.074 ns | 0.0945 ns | 0.0838 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int)'                              | Job-JMDAGQ | .NET 10.0 | True    | 15.599 ns | 0.0758 ns | 0.0672 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer)'        | Job-JMDAGQ | .NET 10.0 | True    | 14.218 ns | 0.0701 ns | 0.0656 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer, clear)' | Job-JMDAGQ | .NET 10.0 | True    | 15.493 ns | 0.1270 ns | 0.1126 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (rent always)'                | Job-JMDAGQ | .NET 10.0 | True    | 14.905 ns | 0.1333 ns | 0.1113 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (stackalloc, clear)'          | Job-JMDAGQ | .NET 10.0 | True    | 16.223 ns | 0.0889 ns | 0.0788 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (rent always, clear)'         | Job-JMDAGQ | .NET 10.0 | True    | 16.189 ns | 0.0673 ns | 0.0629 ns | 0.0048 |      40 B |
| 'master ReadGuid(int)'                                  | Job-OOTPKI | .NET 8.0  | True    | 15.159 ns | 0.0828 ns | 0.0774 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int)'                              | Job-OOTPKI | .NET 8.0  | True    | 14.800 ns | 0.0850 ns | 0.0795 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer)'        | Job-OOTPKI | .NET 8.0  | True    | 14.694 ns | 0.0633 ns | 0.0561 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (shared _smallBuffer, clear)' | Job-OOTPKI | .NET 8.0  | True    | 16.649 ns | 0.1274 ns | 0.1063 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (rent always)'                | Job-OOTPKI | .NET 8.0  | True    | 15.557 ns | 0.0964 ns | 0.0902 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (stackalloc, clear)'          | Job-OOTPKI | .NET 8.0  | True    | 16.606 ns | 0.1043 ns | 0.0924 ns | 0.0048 |      40 B |
| 'local_opt2 ReadGuid(int) (rent always, clear)'         | Job-OOTPKI | .NET 8.0  | True    | 19.030 ns | 0.0784 ns | 0.0695 ns | 0.0048 |      40 B |

| Method                                                    | Job        | Toolchain | Length | Mean        | Error     | StdDev    | Median      | Gen0   | Allocated |
|---------------------------------------------------------- |----------- |---------- |------- |------------:|----------:|----------:|------------:|-------:|----------:|
| 'master ReadString(Charset,int)'                          | Job-JMDAGQ | .NET 10.0 | 16     |    19.30 ns |  0.262 ns |  0.232 ns |    19.27 ns | 0.0115 |      96 B |
| 'local_opt2 ReadString(Charset,int)'                      | Job-JMDAGQ | .NET 10.0 | 16     |    20.36 ns |  0.298 ns |  0.264 ns |    20.42 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (rent always)'        | Job-JMDAGQ | .NET 10.0 | 16     |    26.13 ns |  0.375 ns |  0.332 ns |    26.13 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | 16     |    21.65 ns |  0.383 ns |  0.320 ns |    21.72 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | 16     |    32.56 ns |  0.681 ns |  0.976 ns |    32.10 ns | 0.0067 |      56 B |
| 'master ReadString(Charset,int)'                          | Job-OOTPKI | .NET 8.0  | 16     |    27.17 ns |  0.591 ns |  0.494 ns |    27.19 ns | 0.0115 |      96 B |
| 'local_opt2 ReadString(Charset,int)'                      | Job-OOTPKI | .NET 8.0  | 16     |    19.66 ns |  0.335 ns |  0.297 ns |    19.67 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (rent always)'        | Job-OOTPKI | .NET 8.0  | 16     |    45.79 ns |  0.136 ns |  0.127 ns |    45.79 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | 16     |    21.78 ns |  0.306 ns |  0.286 ns |    21.84 ns | 0.0067 |      56 B |
| 'local_opt2 ReadString(Charset,int) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | 16     |    43.31 ns |  0.267 ns |  0.237 ns |    43.28 ns | 0.0067 |      56 B |
| 'master ReadString(Charset,int)'                          | Job-JMDAGQ | .NET 10.0 | 8192   | 1,219.05 ns | 24.305 ns | 70.124 ns | 1,229.67 ns | 2.9354 |   24624 B |
| 'local_opt2 ReadString(Charset,int)'                      | Job-JMDAGQ | .NET 10.0 | 8192   |   851.78 ns | 16.776 ns | 25.110 ns |   841.90 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (rent always)'        | Job-JMDAGQ | .NET 10.0 | 8192   |   821.44 ns | 15.220 ns | 32.762 ns |   815.46 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | 8192   |   901.15 ns |  8.380 ns |  7.429 ns |   901.89 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | 8192   |   897.47 ns |  6.985 ns |  6.192 ns |   899.46 ns | 1.9569 |   16408 B |
| 'master ReadString(Charset,int)'                          | Job-OOTPKI | .NET 8.0  | 8192   | 1,144.40 ns | 22.209 ns | 22.807 ns | 1,142.92 ns | 2.9354 |   24624 B |
| 'local_opt2 ReadString(Charset,int)'                      | Job-OOTPKI | .NET 8.0  | 8192   |   910.72 ns | 18.811 ns | 55.465 ns |   926.57 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (rent always)'        | Job-OOTPKI | .NET 8.0  | 8192   |   968.63 ns | 12.095 ns | 11.314 ns |   964.06 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | 8192   | 1,111.98 ns | 22.006 ns | 23.546 ns | 1,111.28 ns | 1.9569 |   16408 B |
| 'local_opt2 ReadString(Charset,int) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | 8192   |   916.42 ns | 17.810 ns | 16.659 ns |   910.25 ns | 1.9569 |   16408 B |

| Method                                        | Job        | Toolchain | Value | Mean       | Error     | StdDev    | Gen0   | Allocated |
|---------------------------------------------- |----------- |---------- |------ |-----------:|----------:|----------:|-------:|----------:|
| 'master Write(bool) (alloc)'                  | Job-JMDAGQ | .NET 10.0 | True  |  1.0387 ns | 0.0074 ns | 0.0066 ns |      - |         - |
| 'local_opt2 Write(bool) (stackalloc)'         | Job-JMDAGQ | .NET 10.0 | True  |  0.9071 ns | 0.0183 ns | 0.0153 ns |      - |         - |
| 'local_opt2 Write(bool) (rent always)'        | Job-JMDAGQ | .NET 10.0 | True  | 10.9253 ns | 0.0387 ns | 0.0343 ns |      - |         - |
| 'local_opt2 Write(bool) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | True  |  1.1011 ns | 0.0176 ns | 0.0156 ns |      - |         - |
| 'local_opt2 Write(bool) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | True  | 10.8589 ns | 0.0415 ns | 0.0388 ns |      - |         - |
| 'master Write(bool) (alloc)'                  | Job-OOTPKI | .NET 8.0  | True  |  5.3841 ns | 0.1244 ns | 0.1222 ns | 0.0038 |      32 B |
| 'local_opt2 Write(bool) (stackalloc)'         | Job-OOTPKI | .NET 8.0  | True  |  0.8540 ns | 0.0097 ns | 0.0091 ns |      - |         - |
| 'local_opt2 Write(bool) (rent always)'        | Job-OOTPKI | .NET 8.0  | True  | 23.8173 ns | 0.1018 ns | 0.0902 ns |      - |         - |
| 'local_opt2 Write(bool) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | True  |  1.8278 ns | 0.0159 ns | 0.0133 ns |      - |         - |
| 'local_opt2 Write(bool) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | True  | 21.2236 ns | 0.0771 ns | 0.0683 ns |      - |         - |

| Method                                            | Job        | Toolchain | Varying | Mean      | Error     | StdDev    | Gen0   | Allocated |
|-------------------------------------------------- |----------- |---------- |-------- |----------:|----------:|----------:|-------:|----------:|
| 'master Write(Guid,int) (alloc)'                  | Job-JMDAGQ | .NET 10.0 | False   | 13.794 ns | 0.1303 ns | 0.1155 ns | 0.0048 |      40 B |
| 'local_opt2 Write(Guid,int) (stackalloc)'         | Job-JMDAGQ | .NET 10.0 | False   |  6.506 ns | 0.0147 ns | 0.0130 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always)'        | Job-JMDAGQ | .NET 10.0 | False   | 13.773 ns | 0.0432 ns | 0.0404 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | False   |  6.782 ns | 0.0257 ns | 0.0241 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | False   | 15.226 ns | 0.0611 ns | 0.0510 ns |      - |         - |
| 'master Write(Guid,int) (alloc)'                  | Job-OOTPKI | .NET 8.0  | False   | 22.003 ns | 0.1068 ns | 0.0947 ns | 0.0210 |     176 B |
| 'local_opt2 Write(Guid,int) (stackalloc)'         | Job-OOTPKI | .NET 8.0  | False   |  6.429 ns | 0.0346 ns | 0.0289 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always)'        | Job-OOTPKI | .NET 8.0  | False   | 30.266 ns | 0.1391 ns | 0.1302 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | False   |  6.795 ns | 0.0281 ns | 0.0219 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | False   | 28.376 ns | 0.1164 ns | 0.1032 ns |      - |         - |
| 'master Write(Guid,int) (alloc)'                  | Job-JMDAGQ | .NET 10.0 | True    | 14.546 ns | 0.1021 ns | 0.0955 ns | 0.0048 |      40 B |
| 'local_opt2 Write(Guid,int) (stackalloc)'         | Job-JMDAGQ | .NET 10.0 | True    |  6.640 ns | 0.0281 ns | 0.0235 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always)'        | Job-JMDAGQ | .NET 10.0 | True    | 14.207 ns | 0.0667 ns | 0.0624 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | True    |  7.135 ns | 0.0261 ns | 0.0204 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | True    | 16.790 ns | 0.0446 ns | 0.0417 ns |      - |         - |
| 'master Write(Guid,int) (alloc)'                  | Job-OOTPKI | .NET 8.0  | True    | 27.546 ns | 0.1983 ns | 0.1758 ns | 0.0249 |     208 B |
| 'local_opt2 Write(Guid,int) (stackalloc)'         | Job-OOTPKI | .NET 8.0  | True    |  6.801 ns | 0.0214 ns | 0.0189 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always)'        | Job-OOTPKI | .NET 8.0  | True    | 31.503 ns | 0.0817 ns | 0.0725 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | True    |  7.310 ns | 0.0354 ns | 0.0296 ns |      - |         - |
| 'local_opt2 Write(Guid,int) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | True    | 29.880 ns | 0.1814 ns | 0.1697 ns |      - |         - |

| Method                                       | Job        | Toolchain | Value     | Mean       | Error     | StdDev    | Gen0   | Allocated |
|--------------------------------------------- |----------- |---------- |---------- |-----------:|----------:|----------:|-------:|----------:|
| 'master Write(int) (alloc)'                  | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.5760 ns | 0.0117 ns | 0.0109 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc)'         | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.5028 ns | 0.0266 ns | 0.0236 ns |      - |         - |
| 'local_opt2 Write(int) (rent always)'        | Job-JMDAGQ | .NET 10.0 | 123456789 | 10.5113 ns | 0.0482 ns | 0.0450 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc, clear)'  | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.6139 ns | 0.0076 ns | 0.0071 ns |      - |         - |
| 'local_opt2 Write(int) (rent always, clear)' | Job-JMDAGQ | .NET 10.0 | 123456789 | 10.8079 ns | 0.0564 ns | 0.0471 ns |      - |         - |
| 'master Write(int) (alloc)'                  | Job-OOTPKI | .NET 8.0  | 123456789 |  4.4407 ns | 0.0798 ns | 0.0666 ns | 0.0038 |      32 B |
| 'local_opt2 Write(int) (stackalloc)'         | Job-OOTPKI | .NET 8.0  | 123456789 |  1.9084 ns | 0.0153 ns | 0.0127 ns |      - |         - |
| 'local_opt2 Write(int) (rent always)'        | Job-OOTPKI | .NET 8.0  | 123456789 | 23.3573 ns | 0.0798 ns | 0.0707 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc, clear)'  | Job-OOTPKI | .NET 8.0  | 123456789 |  1.9451 ns | 0.0060 ns | 0.0053 ns |      - |         - |
| 'local_opt2 Write(int) (rent always, clear)' | Job-OOTPKI | .NET 8.0  | 123456789 | 20.7482 ns | 0.0902 ns | 0.0844 ns |      - |         - |

| Method                                              | Job        | Toolchain | CharLength | Mean       | Error      | StdDev     | Median     | Gen0   | Gen1   | Allocated |
|---------------------------------------------------- |----------- |---------- |----------- |-----------:|-----------:|-----------:|-----------:|-------:|-------:|----------:|
| 'master Write(string) (GetBytes alloc)'             | Job-JMDAGQ | .NET 10.0 | 16         |  11.972 ns |  0.1377 ns |  0.1150 ns |  11.970 ns | 0.0048 |      - |      40 B |
| 'local_opt2 Write(string) (stackalloc/rent)'        | Job-JMDAGQ | .NET 10.0 | 16         |   8.717 ns |  0.0308 ns |  0.0288 ns |   8.721 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always)'            | Job-JMDAGQ | .NET 10.0 | 16         |  13.509 ns |  0.0772 ns |  0.0684 ns |  13.496 ns |      - |      - |         - |
| 'local_opt2 Write(string) (stackalloc/rent, clear)' | Job-JMDAGQ | .NET 10.0 | 16         |  10.160 ns |  0.0295 ns |  0.0276 ns |  10.158 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always, clear)'     | Job-JMDAGQ | .NET 10.0 | 16         |  15.498 ns |  0.0572 ns |  0.0507 ns |  15.502 ns |      - |      - |         - |
| 'master Write(string) (GetBytes alloc)'             | Job-OOTPKI | .NET 8.0  | 16         |  15.841 ns |  0.2437 ns |  0.2161 ns |  15.838 ns | 0.0086 |      - |      72 B |
| 'local_opt2 Write(string) (stackalloc/rent)'        | Job-OOTPKI | .NET 8.0  | 16         |  10.893 ns |  0.0682 ns |  0.0605 ns |  10.894 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always)'            | Job-OOTPKI | .NET 8.0  | 16         |  32.174 ns |  0.1287 ns |  0.1204 ns |  32.159 ns |      - |      - |         - |
| 'local_opt2 Write(string) (stackalloc/rent, clear)' | Job-OOTPKI | .NET 8.0  | 16         |  13.041 ns |  0.0288 ns |  0.0241 ns |  13.046 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always, clear)'     | Job-OOTPKI | .NET 8.0  | 16         |  33.083 ns |  0.2114 ns |  0.1765 ns |  33.066 ns |      - |      - |         - |
| 'master Write(string) (GetBytes alloc)'             | Job-JMDAGQ | .NET 10.0 | 8192       | 482.449 ns |  9.6229 ns | 25.1815 ns | 471.592 ns | 0.9813 |      - |    8216 B |
| 'local_opt2 Write(string) (stackalloc/rent)'        | Job-JMDAGQ | .NET 10.0 | 8192       | 164.660 ns |  0.6100 ns |  0.5408 ns | 164.541 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always)'            | Job-JMDAGQ | .NET 10.0 | 8192       | 163.243 ns |  0.3665 ns |  0.3428 ns | 163.237 ns |      - |      - |         - |
| 'local_opt2 Write(string) (stackalloc/rent, clear)' | Job-JMDAGQ | .NET 10.0 | 8192       | 576.081 ns |  3.4071 ns |  3.1870 ns | 576.369 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always, clear)'     | Job-JMDAGQ | .NET 10.0 | 8192       | 564.724 ns |  2.1457 ns |  2.0071 ns | 564.476 ns |      - |      - |         - |
| 'master Write(string) (GetBytes alloc)'             | Job-OOTPKI | .NET 8.0  | 8192       | 547.055 ns | 10.2174 ns | 23.4761 ns | 535.135 ns | 0.9851 | 0.0305 |    8248 B |
| 'local_opt2 Write(string) (stackalloc/rent)'        | Job-OOTPKI | .NET 8.0  | 8192       | 204.426 ns |  0.6204 ns |  0.5500 ns | 204.526 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always)'            | Job-OOTPKI | .NET 8.0  | 8192       | 197.471 ns |  0.5817 ns |  0.5441 ns | 197.474 ns |      - |      - |         - |
| 'local_opt2 Write(string) (stackalloc/rent, clear)' | Job-OOTPKI | .NET 8.0  | 8192       | 632.919 ns |  2.7815 ns |  2.6018 ns | 633.331 ns |      - |      - |         - |
| 'local_opt2 Write(string) (rent always, clear)'     | Job-OOTPKI | .NET 8.0  | 8192       | 631.340 ns |  1.6094 ns |  1.4267 ns | 631.268 ns |      - |      - |         - |

There are benchmarks comparing various methods of buffer handling, including: original, stackalloc by default (where available), using of preallocated smallBuffer (sizeof(data) <= 16) and using renting of ArrayPool. Also there are variants which perform erasing of data (clear variants) from buffers after operations for security.

The main pattern in speed is:
stackalloc > _smallBuffer > new byte[] > ArrayPool (for small types)
and
stackalloc > ArrayPool > new byte[] (for large types)

However in repeated operations only new byte[] operations are causing new allocations, rest are allocation-free (aside from the first in size category array pool rent)

Synchronous operations are able to use stackalloc for IO buffer, while async operations have to use heap-allocated buffer, so the synchronous operations can benefit regardless of data size, while async operations can benefit from reduced allocations, but have slightly worse cpu performance for renting the small buffers, though in practice it shouldn't be measurable when dealing with actual IO, networking and db engine.

Also there is a potential for implementing clean-up of data from buffers for hypothetical security improvements, as the performance impact of such action is not very big.

Also the benchmarks were performed for two of the currently main .net versions (8 and 10) and in some cases .net 10's escape analysis improvements can avoid some of the heap allocations for array creation (see bool and int benchmarks), but the performance of stackalloc is still better, regardless of version, also .net 10 reduces performance impact from using ArrayPool. The only thing left to try is to change Array pool usage from global shared to static for XdrReaderWriter.

Source for the benchmarks is available in perf project in the branch of my fork: local_opt2_benchmarks

pl752 · 2025-12-20T12:13:03Z

Upd6: I have decided to experiment with inplace buffering with MemoryMarshal.AsBytes and .CreateSpan, but the changes aren't very relevant (aside from potential stack memory consumption reduction), as the overall performance is already not really affected much, by the previous iterations of optimizations (benchmarks branch was updated), so it's just over-optimization since introduction of allocation-free methods (or semi-allocation-free like pooling).

New benchmarks

| Method                                                                | Job        | Toolchain | Mean      | Error     | StdDev    | Allocated |
|---------------------------------------------------------------------- |----------- |---------- |----------:|----------:|----------:|----------:|
| 'master ReadInt32() using shared _smallBuffer'                        | Job-JMDAGQ | .NET 10.0 | 0.7968 ns | 0.0108 ns | 0.0096 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc)'                                 | Job-JMDAGQ | .NET 10.0 | 0.5206 ns | 0.0037 ns | 0.0033 ns |         - |
| 'local_opt2 ReadInt32() using shared _smallBuffer, clear with AsSpan' | Job-JMDAGQ | .NET 10.0 | 1.1773 ns | 0.0243 ns | 0.0227 ns |         - |
| 'local_opt3 ReadInt32() (MemoryMarshal.AsBytes + CreateSpan)'         | Job-JMDAGQ | .NET 10.0 | 0.2448 ns | 0.0058 ns | 0.0052 ns |         - |
| 'master ReadInt32() using shared _smallBuffer'                        | Job-OOTPKI | .NET 8.0  | 0.7947 ns | 0.0132 ns | 0.0103 ns |         - |
| 'local_opt2 ReadInt32() (stackalloc)'                                 | Job-OOTPKI | .NET 8.0  | 0.5203 ns | 0.0109 ns | 0.0097 ns |         - |
| 'local_opt2 ReadInt32() using shared _smallBuffer, clear with AsSpan' | Job-OOTPKI | .NET 8.0  | 1.1589 ns | 0.0059 ns | 0.0046 ns |         - |
| 'local_opt3 ReadInt32() (MemoryMarshal.AsBytes + CreateSpan)'         | Job-OOTPKI | .NET 8.0  | 0.2903 ns | 0.0093 ns | 0.0072 ns |         - |

| Method                                                       | Job        | Toolchain | Value     | Mean       | Error     | StdDev    | Gen0   | Allocated |
|------------------------------------------------------------- |----------- |---------- |---------- |-----------:|----------:|----------:|-------:|----------:|
| 'master Write(int) (alloc)'                                  | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.5819 ns | 0.0059 ns | 0.0052 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc)'                         | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.5287 ns | 0.0116 ns | 0.0103 ns |      - |         - |
| 'local_opt3 Write(int) (MemoryMarshal.AsBytes + CreateSpan)' | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.2865 ns | 0.0035 ns | 0.0029 ns |      - |         - |
| 'local_opt2 Write(int) (rent always)'                        | Job-JMDAGQ | .NET 10.0 | 123456789 | 10.5444 ns | 0.0194 ns | 0.0181 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc, clear)'                  | Job-JMDAGQ | .NET 10.0 | 123456789 |  0.6785 ns | 0.0096 ns | 0.0085 ns |      - |         - |
| 'local_opt2 Write(int) (rent always, clear)'                 | Job-JMDAGQ | .NET 10.0 | 123456789 | 10.7287 ns | 0.0607 ns | 0.0568 ns |      - |         - |
| 'master Write(int) (alloc)'                                  | Job-OOTPKI | .NET 8.0  | 123456789 |  6.0098 ns | 0.1376 ns | 0.1584 ns | 0.0038 |      32 B |
| 'local_opt2 Write(int) (stackalloc)'                         | Job-OOTPKI | .NET 8.0  | 123456789 |  0.5301 ns | 0.0156 ns | 0.0130 ns |      - |         - |
| 'local_opt3 Write(int) (MemoryMarshal.AsBytes + CreateSpan)' | Job-OOTPKI | .NET 8.0  | 123456789 |  0.6046 ns | 0.0152 ns | 0.0135 ns |      - |         - |
| 'local_opt2 Write(int) (rent always)'                        | Job-OOTPKI | .NET 8.0  | 123456789 | 22.9369 ns | 0.0306 ns | 0.0271 ns |      - |         - |
| 'local_opt2 Write(int) (stackalloc, clear)'                  | Job-OOTPKI | .NET 8.0  | 123456789 |  0.6431 ns | 0.0143 ns | 0.0127 ns |      - |         - |
| 'local_opt2 Write(int) (rent always, clear)'                 | Job-OOTPKI | .NET 8.0  | 123456789 | 21.0544 ns | 0.0736 ns | 0.0652 ns |      - |         - |

pl752 · 2025-12-20T12:14:32Z

Also I am still waiting for reviews/opinions, I can make any adjustments required, or separate out most yielding changes (like rune processing), if the small optimizations are too complicated to review all at once.

…lable in standard lirary

pl752 · 2025-12-20T14:06:50Z

Upd8: Decided to implement some size edge cases and ~~ascii-only~~ bmp-only strings specific optimizations to rune related methods, also replaced some unneeded methods.

Rune impl benchmarks

| Method                                        | Job        | Toolchain | Kind                 | RuneLength | MaxRuneCount | Mean      | Error     | StdDev    | Median    | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|---------------------------------------------- |----------- |---------- |--------------------- |----------- |------------- |----------:|----------:|----------:|----------:|------:|--------:|-------:|----------:|------------:|
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          | 117.83 ns |  3.247 ns |  9.472 ns | 118.01 ns |  1.01 |    0.11 | 0.0334 |     280 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |  20.53 ns |  1.192 ns |  3.458 ns |  21.20 ns |  0.18 |    0.03 | 0.0335 |     280 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          | 113.01 ns |  2.309 ns |  5.836 ns | 112.96 ns |  1.00 |    0.07 | 0.0334 |     280 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |  17.22 ns |  0.272 ns |  0.302 ns |  17.14 ns |  0.15 |    0.01 | 0.0335 |     280 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          | 356.61 ns |  4.256 ns |  3.772 ns | 356.42 ns |  1.00 |    0.01 | 0.1249 |    1048 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |  58.19 ns |  1.418 ns |  4.024 ns |  57.35 ns |  0.16 |    0.01 | 0.1253 |    1048 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          | 447.18 ns |  8.996 ns | 23.540 ns | 454.22 ns |  1.00 |    0.08 | 0.1249 |    1048 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          |  79.19 ns |  5.037 ns | 14.853 ns |  77.34 ns |  0.18 |    0.03 | 0.1253 |    1048 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          | 128.13 ns |  3.666 ns | 10.808 ns | 126.82 ns |  1.01 |    0.12 | 0.0410 |     344 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |  16.93 ns |  0.387 ns |  0.791 ns |  16.80 ns |  0.13 |    0.01 | 0.0411 |     344 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          | 121.79 ns |  2.492 ns |  5.523 ns | 120.67 ns |  1.00 |    0.06 | 0.0410 |     344 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |  18.89 ns |  0.391 ns |  0.858 ns |  18.66 ns |  0.16 |    0.01 | 0.0411 |     344 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 436.43 ns |  8.777 ns | 10.449 ns | 439.27 ns |  1.00 |    0.03 | 0.1554 |    1304 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 376.56 ns |  7.524 ns | 14.675 ns | 374.07 ns |  0.86 |    0.04 | 0.1554 |    1304 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 422.26 ns |  5.899 ns |  5.229 ns | 423.12 ns |  1.00 |    0.02 | 0.1554 |    1304 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 393.42 ns |  5.567 ns |  4.649 ns | 392.34 ns |  0.93 |    0.02 | 0.1554 |    1304 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          | 150.52 ns |  2.176 ns |  1.817 ns | 150.22 ns |  1.00 |    0.02 | 0.0601 |     504 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |  21.68 ns |  0.482 ns |  1.296 ns |  20.99 ns |  0.14 |    0.01 | 0.0602 |     504 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          | 154.18 ns |  3.114 ns |  6.361 ns | 155.56 ns |  1.00 |    0.06 | 0.0601 |     504 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |  23.86 ns |  0.502 ns |  1.390 ns |  23.22 ns |  0.16 |    0.01 | 0.0602 |     504 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 556.12 ns | 10.914 ns | 11.208 ns | 556.83 ns |  1.00 |    0.03 | 0.2317 |    1944 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 492.66 ns |  9.859 ns |  9.222 ns | 490.17 ns |  0.89 |    0.02 | 0.2317 |    1944 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 545.49 ns |  8.397 ns |  7.444 ns | 544.45 ns |  1.00 |    0.02 | 0.2317 |    1944 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 557.46 ns | 10.655 ns | 10.464 ns | 556.98 ns |  1.02 |    0.02 | 0.2317 |    1944 B |        1.00 |

| Method                  | Job        | Toolchain | Kind                 | RuneLength | Mean         | Error      | StdDev     | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------------------ |----------- |---------- |--------------------- |----------- |-------------:|-----------:|-----------:|------:|--------:|----------:|------------:|
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |    70.216 ns |  1.3830 ns |  2.3484 ns |  1.00 |    0.05 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |     4.904 ns |  0.0343 ns |  0.0321 ns |  0.07 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |    68.014 ns |  1.3878 ns |  1.4849 ns |  1.00 |    0.03 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |     6.129 ns |  0.0646 ns |  0.0605 ns |  0.09 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       | 3,884.546 ns | 31.2551 ns | 29.2360 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       |   254.202 ns |  3.0534 ns |  2.5497 ns |  0.07 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       | 4,064.459 ns | 42.3702 ns | 37.5601 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       |   254.313 ns |  4.4622 ns |  4.1739 ns |  0.06 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    74.100 ns |  0.8221 ns |  0.7690 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    77.034 ns |  1.5091 ns |  1.8533 ns |  1.04 |    0.03 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    85.025 ns |  1.7147 ns |  2.4037 ns |  1.00 |    0.04 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    85.963 ns |  0.7137 ns |  0.5960 ns |  1.01 |    0.03 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 4,396.650 ns | 58.2950 ns | 54.5292 ns |  1.00 |    0.02 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 4,434.144 ns | 22.8985 ns | 21.4193 ns |  1.01 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 4,904.013 ns | 41.1963 ns | 34.4008 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 4,822.738 ns | 31.6198 ns | 29.5771 ns |  0.98 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    94.261 ns |  0.9110 ns |  0.7607 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    99.823 ns |  1.4972 ns |  1.4005 ns |  1.06 |    0.02 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |    96.153 ns |  0.3468 ns |  0.3074 ns |  1.00 |    0.00 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |   101.584 ns |  1.0409 ns |  0.8692 ns |  1.06 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 5,546.374 ns | 25.6919 ns | 21.4539 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 5,574.675 ns | 79.9365 ns | 70.8616 ns |  1.01 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 5,686.914 ns | 37.1821 ns | 31.0488 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 5,720.962 ns | 76.7373 ns | 71.7802 ns |  1.01 |    0.01 |         - |          NA |

pl752 · 2025-12-29T09:07:37Z

Decided to close the PR and split it up due to changes being too broad. New PRs (#1252 and #1253) were created instead, these contain mostly the same significant changes, but I have temporarily (or I haven't decided yet) thrown away some of the less significant in terms of perfrormance/size of diff changes, to simplify testing process.

pl752 added 5 commits December 11, 2025 11:01

Optimized memory allocations using stackalloc, spans and pooled arrays

5556482

Reworked the rune enumerator to not spam byte[1...4] alloc

ddaf012

ations and optimized rune operations

Removed necessity to allocate 0 size array

3a1e3ce

Elliminated some linq queries and allocations

c058fc5

Adjusted code style

8a921c2

pl752 added 2 commits December 12, 2025 13:57

Fixed boolean buffer overwrite mishap

1c46f86

Fixed static/nonstatic call mismatch (Some internal methods were turn…

c5fadc8

…ed static, breaking tests)

pl752 added 7 commits December 12, 2025 23:19

Consolidated bool and pad reading into single read

d1c61a7

Fixed wrong commit push

f87f965

Refactored and optimized overcomplicated auth-block

8835558

Forgot to optimize guid with span

c97b4eb

Decided to expand the small buffer to avoid need to rent some of the …

24ff951

…fixed size 16 byte arrays

Stackalloc for int reads

002fe16

Changed float to int byte conversion to always use stackalloc

f256061

pl752 added 2 commits December 20, 2025 19:02

Optimized rune methods for length edge cases and ascii-only strings

9c63165

Found out that float to int bytes conversion methods are already avai…

92f76ba

…lable in standard lirary

Fixed length validation mishap for dec16

348e27b

pl752 changed the title ~~(Improvements, need help testing) Rune related and span-ish optimizations~~ (Improvements) Rune related and span-ish optimizations Dec 23, 2025

pl752 mentioned this pull request Dec 29, 2025

(Improvements) Rune operations optimization #1252

Open

pl752 mentioned this pull request Dec 29, 2025

(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

Open

pl752 closed this Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Improvements) Rune related and span-ish optimizations#1247

(Improvements) Rune related and span-ish optimizations#1247
pl752 wants to merge 17 commits intoFirebirdSQL:masterfrom
pl752:local_opt2

pl752 commented Dec 11, 2025 •

edited

Loading

Uh oh!

niekschoemaker commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 12, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 14, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 20, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 20, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 20, 2025 •

edited

Loading

Uh oh!

pl752 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pl752 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niekschoemaker commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 12, 2025

Uh oh!

pl752 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pl752 commented Dec 11, 2025 •

edited

Loading

pl752 commented Dec 12, 2025 •

edited

Loading

pl752 commented Dec 12, 2025 •

edited

Loading

pl752 commented Dec 12, 2025 •

edited

Loading

pl752 commented Dec 14, 2025 •

edited

Loading

pl752 commented Dec 20, 2025 •

edited

Loading

pl752 commented Dec 20, 2025 •

edited

Loading

pl752 commented Dec 20, 2025 •

edited

Loading