Inline object allocators in RyuJIT by EgorBo · Pull Request #125475 · dotnet/runtime

EgorBo · 2026-03-12T04:01:44Z

Just an AI-driven experiment to inline allocators in codegen.
My first attempt was to move them to a managed helper, but that was not inlineable (due to nogc requirement) so we couldn't benefit from constant-folded obj->BaseSize or constant-folded length checks for array allocators. Also, its GetAllocContext TLS access was inefficient. Also, we shouldn't expand allocators early anyway.

static object Foo()
{
    return new Program();
}

; Foo
       push     rdi
       push     rsi
       push     rbx
       mov      rbx, 0x7FFC642063E8      ; Program
       mov      rsi, qword ptr GS:[0x0058]
       mov      rdi, 0x7FFCC34D7818
       mov      edi, dword ptr [rdi]
       mov      rsi, qword ptr [rsi+8*rdi]
       lea      rsi, [rsi+0x30]
       mov      edi, dword ptr [rbx+0x04]
       mov      rax, qword ptr [rsi+0x08]
       lea      rdi, [rax+rdi]
       cmp      rdi, qword ptr [rsi]
       ja       SHORT G_M29399_IG04
       mov      qword ptr [rax], rbx
       mov      qword ptr [rsi+0x08], rdi
       jmp      SHORT G_M29399_IG05
G_M29399_IG04:
       mov      rcx, rbx
       call     CORINFO_HELP_NEWSFAST
G_M29399_IG05:
       nop      
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
; Total bytes of code 80

dotnet-policy-service · 2026-03-12T04:03:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR adds a new JIT/EE interface to expose platform-specific TLS access details for the thread-local allocation context and uses that information to keep GT_ALLOCOBJ through morphing so xarch codegen can emit an inline bump-pointer allocation fast path (with a helper-call slow path).

Changes:

Introduces getObjectAllocContextInfo / CORINFO_OBJECT_ALLOC_CONTEXT_INFO across the JIT/EE boundary (plus SuperPMI + NativeAOT plumbing).
Adds EE support to describe how to locate t_runtime_thread_locals (allocation context) via TLS on supported platforms.
Implements xarch codegen/LSRA handling to generate inline allocation for GT_ALLOCOBJ (currently Windows-focused, guarded by config/opts).

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/coreclr/vm/threadstatics.h	Declares a new EE helper to provide TLS access metadata for the alloc context.
src/coreclr/vm/threadstatics.cpp	Implements TLS metadata discovery for `t_runtime_thread_locals` on multiple targets.
src/coreclr/vm/jitinterface.cpp	Exposes `CEEInfo::getObjectAllocContextInfo` and gates support based on runtime configuration.
src/coreclr/vm/amd64/asmhelpers.S	Adds descriptor helpers for TLS access to `t_runtime_thread_locals` (Apple TLVP + ELF TLSGD).
src/coreclr/tools/superpmi/superpmi/icorjitinfo.cpp	Records/replays the new JIT/EE query in SuperPMI.
src/coreclr/tools/superpmi/superpmi-shim-simple/icorjitinfo_generated.cpp	Forwards the new API in the simple shim.
src/coreclr/tools/superpmi/superpmi-shim-counter/icorjitinfo_generated.cpp	Counts/intercepts the new API in the counter shim.
src/coreclr/tools/superpmi/superpmi-shim-collector/icorjitinfo.cpp	Records/replays the new API in the collector shim.
src/coreclr/tools/superpmi/superpmi-shared/methodcontext.h	Adds packets + record/replay declarations for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp	Implements record/dump/replay for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/lwmlist.h	Adds LWM entry for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/agnostic.h	Defines the agnostic record type for alloc-context info.
src/coreclr/tools/aot/jitinterface/jitinterface_generated.h	Adds callback + wrapper method for `getObjectAllocContextInfo`.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/ThunkInput.txt	Extends thunk generation inputs with the new API + struct.
src/coreclr/tools/Common/JitInterface/CorInfoTypes.cs	Adds managed definition of `CORINFO_OBJECT_ALLOC_CONTEXT_INFO`.
src/coreclr/tools/Common/JitInterface/CorInfoImpl_generated.cs	Adds unmanaged callback plumbing for `getObjectAllocContextInfo`.
src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs	Provides a stub implementation for NativeAOT/crossgen2.
src/coreclr/jit/objectalloc.cpp	Keeps `GT_ALLOCOBJ` for codegen-side expansion under certain conditions.
src/coreclr/jit/lsraxarch.cpp	Adds LSRA build logic for `GT_ALLOCOBJ` with internal temps + call kill/defs.
src/coreclr/jit/lsrabuild.cpp	Adds kill-set modeling for `GT_ALLOCOBJ` as `NEWSFAST`.
src/coreclr/jit/lower.cpp	Adds a lowering case placeholder for `GT_ALLOCOBJ` on xarch.
src/coreclr/jit/jitconfigvalues.h	Introduces `JitInlineAllocFast` config knob.
src/coreclr/jit/gtlist.h	Allows `GT_ALLOCOBJ` in LIR for codegen-side handling.
src/coreclr/jit/compiler.h	Caches `CORINFO_OBJECT_ALLOC_CONTEXT_INFO` in the compiler instance.
src/coreclr/jit/codegenxarch.cpp	Implements inline fast-path allocation codegen for `GT_ALLOCOBJ` (+ slow-path helper call).
src/coreclr/jit/codegen.h	Declares `genCodeForAllocObj`.
src/coreclr/jit/ICorJitInfo_wrapper_generated.hpp	Adds wrapper forwarding for the new API.
src/coreclr/jit/ICorJitInfo_names_generated.h	Adds the API name for logging/profiling hooks.
src/coreclr/inc/jiteeversionguid.h	Updates JIT/EE version GUID due to interface shape change.
src/coreclr/inc/icorjitinfoimpl_generated.h	Adds the new virtual method to the generated EE impl interface.
src/coreclr/inc/corinfo.h	Defines `CORINFO_OBJECT_ALLOC_CONTEXT_INFO` + adds new `ICorStaticInfo` method.

src/coreclr/jit/codegenxarch.cpp

src/coreclr/inc/corinfo.h

…inline-allocators

EgorBo · 2026-03-12T11:29:42Z

@EgorBot -windows_x64 -windows_arm64

using BenchmarkDotNet.Attributes;

public class Bench
{
   [Benchmark]
   public Bench CreateInstance() => new Bench();
}

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

src/coreclr/jit/codegenxarch.cpp

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

EgorBo · 2026-03-12T13:56:24Z

@EgorBot -linux_x64

using BenchmarkDotNet.Attributes;

public class Bench
{
   [Benchmark]
   public Bench CreateInstance() => new Bench();
}

jkotas · 2026-03-12T14:24:36Z

src/coreclr/jit/codegenxarch.cpp

+    // ---- Bump allocation (non-GC-interruptible) ----
+    GetEmitter()->emitDisableGC();
+
+    // Size not known at JIT time — read from MethodTable.m_BaseSize at runtime.


Why is that? If you are trying to boil the ocean to try to save a nano second for an allocation, you can also embed the size into the code as a constant.

Why is that? If you are trying to boil the ocean to try to save a nano second for an allocation, you can also embed the size into the code as a constant.

Well, that was one of the goals for this change (even have it in the description), but initially just wanted to see if it works as is

jkotas · 2026-03-12T14:26:56Z

src/coreclr/jit/codegenxarch.cpp

+    // Size not known at JIT time — read from MethodTable.m_BaseSize at runtime.
+    emit->emitIns_R_AR(INS_mov, EA_4BYTE, tmpReg, mtReg, (int)allocInfo->methodTableBaseSizeOffset);
+    emit->emitIns_R_AR(INS_mov, EA_PTRSIZE, dstReg, allocCtxReg, (int)allocInfo->allocPtrFieldOffset);
+    emit->emitIns_R_ARX(INS_lea, EA_PTRSIZE, tmpReg, dstReg, tmpReg, 1, 0);


Different from what the allocation helpers do. This has a potential integer overflow security bug. You are assuming that allocPtr + size won't overflow.

jkotas · 2026-03-12T14:28:45Z

I would be curious whether this improves (and not regresses) anything real. The code bloat is not free and it won't show up in microbenchmarks.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 9 comments.

src/coreclr/jit/codegenxarch.cpp

src/coreclr/jit/objectalloc.cpp

src/coreclr/jit/jitconfigvalues.h

src/coreclr/vm/threadstatics.cpp

src/coreclr/jit/codegenxarch.cpp

src/coreclr/jit/objectalloc.cpp

src/coreclr/vm/threadstatics.cpp

EgorBo · 2026-03-12T14:42:08Z

I would be curious whether this improves (and not regresses) anything real. The code bloat is not free and it won't show up in microbenchmarks.

I'd expect it to do so. e.g. in OrchardCMS (the most real world becnhmark that we currently have in OSS) reports obj->BaseSize access as the hottest part of the app (exclusive time), presumably, due to contention.

(it completely disappeared when I added a new version of the allocator that accepted the precomputed size)

Also, constant folded size checks for array allocations + call overhead

jkotas · 2026-03-12T14:51:00Z

reports obj->BaseSize access as the hottest part of the app (exclusive time), presumably, due to contention.

This assumes that the MethodTable won't be accessed by anything else...

Have we studied this pattern in detail? When we have seen expensive memory accesses like this, it was often from having frequently written to data next the read-only data (MethodTable in this case).

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

src/coreclr/inc/corinfo.h

Copilot · 2026-03-12T16:58:05Z

src/coreclr/jit/codegenxarch.cpp

+        // Linux x64: call __tls_get_addr. Save arg registers on the stack.
+        // Always push an even number of 8-byte values for 16-byte stack alignment.
+        emit->emitIns_R(INS_push, EA_PTRSIZE, mtReg);
+        if (isArray)
+        {
+            emit->emitIns_R(INS_push, EA_PTRSIZE, lenReg);
+        }
+        else
+        {
+            emit->emitIns_R_I(INS_sub, EA_PTRSIZE, REG_SPBASE, 8);
+        }


On the non-Windows AMD64 path this saves argument registers using push/pop (and sub/add rsp, 8) around the __tls_get_addr call. On x64 the JIT generally assumes RSP is stable after the prolog for unwind/GC stack walking; these dynamic SP adjustments are not reflected in unwind data or CodeGen stack tracking and can break EH/stack walking. Prefer saving/restoring the args in volatile registers (e.g., move MT/len into scratch regs before clobbering REG_ARG_0) or into fixed spill slots in the existing frame instead of adjusting RSP.

Copilot · 2026-03-12T16:58:05Z

src/coreclr/jit/gentree.cpp

+                case GT_ALLOCOBJ:
+                    if (op1->AsAllocObj()->gtNewHelper != op2->AsAllocObj()->gtNewHelper ||
+                        op1->AsAllocObj()->gtAllocObjClsHnd != op2->AsAllocObj()->gtAllocObjClsHnd)
+                    {
+                        return false;
+                    }
+                    break;


GenTree::Compare for GT_ALLOCOBJ only compares gtNewHelper and gtAllocObjClsHnd, but GenTreeAllocObj also carries gtHelperHasSideEffects and (under FEATURE_READYTORUN) gtEntryPoint. If any of these can differ, treating nodes as equal can lead to incorrect structural comparisons (e.g., in CSE/value numbering or debug asserts). Consider including gtHelperHasSideEffects and the ReadyToRun entrypoint lookup in the comparison (or document why they are irrelevant/invariant).

- Remove NEWARR_1_VC/PTR inline expansion from objectalloc.cpp and codegen - Remove constant-size folding at codegen time (no more JIT-time MT reads) - Remove objectMethodTableOffset field (MT pointer always at offset 0) - Remove arrayLengthOffset, arrayBaseSize, methodTableComponentSizeOffset - Remove threadVarsSection (macOS not supported for inline alloc) - Add tlsRootOffset field for Linux ARM64 (pre-computed tpidr_el0 offset) - Add genInlineAllocCall for ARM64 in codegenarmarch.cpp - Windows: x18 (TEB) + TLS array + index + offset pattern - Linux: mrs tpidr_el0 + pre-computed offset (no function call needed) - macOS: not supported (set supported=false) - Add GetRuntimeThreadLocalsVariableOffset assembly stub for ARM64 Linux - Update CORINFO_OBJECT_ALLOC_CONTEXT_INFO in corinfo.h, CorInfoTypes.cs - Update SuperPMI agnostic struct and methodcontext rec/dmp/rep - Update jitinterface.cpp and threadstatics.cpp for new struct layout - Change TARGET_AMD64 guards to TARGET_AMD64 || TARGET_ARM64 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated no new comments.

inline allocators

7a2404f

Copilot AI review requested due to automatic review settings March 12, 2026 04:01

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2026

dotnet-policy-service bot assigned EgorBo Mar 12, 2026

Merge branch 'main' into inline-allocators

7f15705

Copilot started reviewing on behalf of EgorBo March 12, 2026 04:08 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved

EgorBo added 2 commits March 12, 2026 05:17

feedback

87795ee

Merge branch 'inline-allocators' of github.com:EgorBo/runtime-1 into …

0f46330

…inline-allocators

EgorBot mentioned this pull request Mar 12, 2026

Benchmarks for dotnet/runtime#125475 (for @EgorBo) EgorBot/Benchmarks#32

Open

fix

8915267

Copilot AI review requested due to automatic review settings March 12, 2026 13:05

Copilot started reviewing on behalf of EgorBo March 12, 2026 13:07 View session

enable for linux

ae052b0

Copilot AI reviewed Mar 12, 2026

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Show resolved Hide resolved

src/coreclr/jit/codegenxarch.cpp Show resolved Hide resolved

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs Show resolved Hide resolved

feedback

e54f5d2

EgorBot mentioned this pull request Mar 12, 2026

Benchmarks for dotnet/runtime#125475 (for @EgorBo) EgorBot/Benchmarks#33

Open

jkotas reviewed Mar 12, 2026

View reviewed changes

add array allocators

6da6690

Copilot AI review requested due to automatic review settings March 12, 2026 14:28

Copilot started reviewing on behalf of EgorBo March 12, 2026 14:30 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

EgorBo added 2 commits March 12, 2026 15:55

feedback

347f5b7

ci fix

49a621b

Copilot AI review requested due to automatic review settings March 12, 2026 16:42

Copilot started reviewing on behalf of EgorBo March 12, 2026 16:45 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

EgorBo and others added 2 commits March 12, 2026 20:15

fix arm64 ci

a1c59b9

Copilot AI review requested due to automatic review settings March 12, 2026 21:22

Copilot started reviewing on behalf of EgorBo March 12, 2026 21:26 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

build-analysis bot mentioned this pull request Mar 13, 2026

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

Conversation

EgorBo commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Mar 12, 2026

Uh oh!

jkotas Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

EgorBo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jkotas commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EgorBo commented Mar 12, 2026 •

edited

Loading

EgorBo commented Mar 12, 2026 •

edited

Loading