Skip to content

Inline object allocators in RyuJIT#125475

Draft
EgorBo wants to merge 12 commits intodotnet:mainfrom
EgorBo:inline-allocators
Draft

Inline object allocators in RyuJIT#125475
EgorBo wants to merge 12 commits intodotnet:mainfrom
EgorBo:inline-allocators

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 12, 2026

Just an AI-driven experiment to inline allocators in codegen.
My first attempt was to move them to a managed helper, but that was not inlineable (due to nogc requirement) so we couldn't benefit from constant-folded obj->BaseSize or constant-folded length checks for array allocators. Also, its GetAllocContext TLS access was inefficient. Also, we shouldn't expand allocators early anyway.

static object Foo()
{
    return new Program();
}
; Foo
       push     rdi
       push     rsi
       push     rbx
       mov      rbx, 0x7FFC642063E8      ; Program
       mov      rsi, qword ptr GS:[0x0058]
       mov      rdi, 0x7FFCC34D7818
       mov      edi, dword ptr [rdi]
       mov      rsi, qword ptr [rsi+8*rdi]
       lea      rsi, [rsi+0x30]
       mov      edi, dword ptr [rbx+0x04]
       mov      rax, qword ptr [rsi+0x08]
       lea      rdi, [rax+rdi]
       cmp      rdi, qword ptr [rsi]
       ja       SHORT G_M29399_IG04
       mov      qword ptr [rax], rbx
       mov      qword ptr [rsi+0x08], rdi
       jmp      SHORT G_M29399_IG05
G_M29399_IG04:
       mov      rcx, rbx
       call     CORINFO_HELP_NEWSFAST
G_M29399_IG05:
       nop      
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
; Total bytes of code 80

Copilot AI review requested due to automatic review settings March 12, 2026 04:01
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new JIT/EE interface to expose platform-specific TLS access details for the thread-local allocation context and uses that information to keep GT_ALLOCOBJ through morphing so xarch codegen can emit an inline bump-pointer allocation fast path (with a helper-call slow path).

Changes:

  • Introduces getObjectAllocContextInfo / CORINFO_OBJECT_ALLOC_CONTEXT_INFO across the JIT/EE boundary (plus SuperPMI + NativeAOT plumbing).
  • Adds EE support to describe how to locate t_runtime_thread_locals (allocation context) via TLS on supported platforms.
  • Implements xarch codegen/LSRA handling to generate inline allocation for GT_ALLOCOBJ (currently Windows-focused, guarded by config/opts).

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/coreclr/vm/threadstatics.h Declares a new EE helper to provide TLS access metadata for the alloc context.
src/coreclr/vm/threadstatics.cpp Implements TLS metadata discovery for t_runtime_thread_locals on multiple targets.
src/coreclr/vm/jitinterface.cpp Exposes CEEInfo::getObjectAllocContextInfo and gates support based on runtime configuration.
src/coreclr/vm/amd64/asmhelpers.S Adds descriptor helpers for TLS access to t_runtime_thread_locals (Apple TLVP + ELF TLSGD).
src/coreclr/tools/superpmi/superpmi/icorjitinfo.cpp Records/replays the new JIT/EE query in SuperPMI.
src/coreclr/tools/superpmi/superpmi-shim-simple/icorjitinfo_generated.cpp Forwards the new API in the simple shim.
src/coreclr/tools/superpmi/superpmi-shim-counter/icorjitinfo_generated.cpp Counts/intercepts the new API in the counter shim.
src/coreclr/tools/superpmi/superpmi-shim-collector/icorjitinfo.cpp Records/replays the new API in the collector shim.
src/coreclr/tools/superpmi/superpmi-shared/methodcontext.h Adds packets + record/replay declarations for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp Implements record/dump/replay for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/lwmlist.h Adds LWM entry for alloc-context info.
src/coreclr/tools/superpmi/superpmi-shared/agnostic.h Defines the agnostic record type for alloc-context info.
src/coreclr/tools/aot/jitinterface/jitinterface_generated.h Adds callback + wrapper method for getObjectAllocContextInfo.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/ThunkInput.txt Extends thunk generation inputs with the new API + struct.
src/coreclr/tools/Common/JitInterface/CorInfoTypes.cs Adds managed definition of CORINFO_OBJECT_ALLOC_CONTEXT_INFO.
src/coreclr/tools/Common/JitInterface/CorInfoImpl_generated.cs Adds unmanaged callback plumbing for getObjectAllocContextInfo.
src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs Provides a stub implementation for NativeAOT/crossgen2.
src/coreclr/jit/objectalloc.cpp Keeps GT_ALLOCOBJ for codegen-side expansion under certain conditions.
src/coreclr/jit/lsraxarch.cpp Adds LSRA build logic for GT_ALLOCOBJ with internal temps + call kill/defs.
src/coreclr/jit/lsrabuild.cpp Adds kill-set modeling for GT_ALLOCOBJ as NEWSFAST.
src/coreclr/jit/lower.cpp Adds a lowering case placeholder for GT_ALLOCOBJ on xarch.
src/coreclr/jit/jitconfigvalues.h Introduces JitInlineAllocFast config knob.
src/coreclr/jit/gtlist.h Allows GT_ALLOCOBJ in LIR for codegen-side handling.
src/coreclr/jit/compiler.h Caches CORINFO_OBJECT_ALLOC_CONTEXT_INFO in the compiler instance.
src/coreclr/jit/codegenxarch.cpp Implements inline fast-path allocation codegen for GT_ALLOCOBJ (+ slow-path helper call).
src/coreclr/jit/codegen.h Declares genCodeForAllocObj.
src/coreclr/jit/ICorJitInfo_wrapper_generated.hpp Adds wrapper forwarding for the new API.
src/coreclr/jit/ICorJitInfo_names_generated.h Adds the API name for logging/profiling hooks.
src/coreclr/inc/jiteeversionguid.h Updates JIT/EE version GUID due to interface shape change.
src/coreclr/inc/icorjitinfoimpl_generated.h Adds the new virtual method to the generated EE impl interface.
src/coreclr/inc/corinfo.h Defines CORINFO_OBJECT_ALLOC_CONTEXT_INFO + adds new ICorStaticInfo method.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2026

@EgorBot -windows_x64 -windows_arm64

using BenchmarkDotNet.Attributes;

public class Bench
{
   [Benchmark]
   public Bench CreateInstance() => new Bench();
}

Copilot AI review requested due to automatic review settings March 12, 2026 13:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2026

@EgorBot -linux_x64

using BenchmarkDotNet.Attributes;

public class Bench
{
   [Benchmark]
   public Bench CreateInstance() => new Bench();
}

// ---- Bump allocation (non-GC-interruptible) ----
GetEmitter()->emitDisableGC();

// Size not known at JIT time — read from MethodTable.m_BaseSize at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? If you are trying to boil the ocean to try to save a nano second for an allocation, you can also embed the size into the code as a constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? If you are trying to boil the ocean to try to save a nano second for an allocation, you can also embed the size into the code as a constant.

Well, that was one of the goals for this change (even have it in the description), but initially just wanted to see if it works as is

// Size not known at JIT time — read from MethodTable.m_BaseSize at runtime.
emit->emitIns_R_AR(INS_mov, EA_4BYTE, tmpReg, mtReg, (int)allocInfo->methodTableBaseSizeOffset);
emit->emitIns_R_AR(INS_mov, EA_PTRSIZE, dstReg, allocCtxReg, (int)allocInfo->allocPtrFieldOffset);
emit->emitIns_R_ARX(INS_lea, EA_PTRSIZE, tmpReg, dstReg, tmpReg, 1, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different from what the allocation helpers do. This has a potential integer overflow security bug. You are assuming that allocPtr + size won't overflow.

@jkotas
Copy link
Member

jkotas commented Mar 12, 2026

I would be curious whether this improves (and not regresses) anything real. The code bloat is not free and it won't show up in microbenchmarks.

Copilot AI review requested due to automatic review settings March 12, 2026 14:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 9 comments.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2026

I would be curious whether this improves (and not regresses) anything real. The code bloat is not free and it won't show up in microbenchmarks.

I'd expect it to do so. e.g. in OrchardCMS (the most real world becnhmark that we currently have in OSS) reports obj->BaseSize access as the hottest part of the app (exclusive time), presumably, due to contention.

image image

(it completely disappeared when I added a new version of the allocator that accepted the precomputed size)

Also, constant folded size checks for array allocations + call overhead

@jkotas
Copy link
Member

jkotas commented Mar 12, 2026

reports obj->BaseSize access as the hottest part of the app (exclusive time), presumably, due to contention.

This assumes that the MethodTable won't be accessed by anything else...

Have we studied this pattern in detail? When we have seen expensive memory accesses like this, it was often from having frequently written to data next the read-only data (MethodTable in this case).

Copilot AI review requested due to automatic review settings March 12, 2026 16:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

Comment on lines +1841 to +1851
// Linux x64: call __tls_get_addr. Save arg registers on the stack.
// Always push an even number of 8-byte values for 16-byte stack alignment.
emit->emitIns_R(INS_push, EA_PTRSIZE, mtReg);
if (isArray)
{
emit->emitIns_R(INS_push, EA_PTRSIZE, lenReg);
}
else
{
emit->emitIns_R_I(INS_sub, EA_PTRSIZE, REG_SPBASE, 8);
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the non-Windows AMD64 path this saves argument registers using push/pop (and sub/add rsp, 8) around the __tls_get_addr call. On x64 the JIT generally assumes RSP is stable after the prolog for unwind/GC stack walking; these dynamic SP adjustments are not reflected in unwind data or CodeGen stack tracking and can break EH/stack walking. Prefer saving/restoring the args in volatile registers (e.g., move MT/len into scratch regs before clobbering REG_ARG_0) or into fixed spill slots in the existing frame instead of adjusting RSP.

Copilot uses AI. Check for mistakes.
Comment on lines +2849 to +2855
case GT_ALLOCOBJ:
if (op1->AsAllocObj()->gtNewHelper != op2->AsAllocObj()->gtNewHelper ||
op1->AsAllocObj()->gtAllocObjClsHnd != op2->AsAllocObj()->gtAllocObjClsHnd)
{
return false;
}
break;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenTree::Compare for GT_ALLOCOBJ only compares gtNewHelper and gtAllocObjClsHnd, but GenTreeAllocObj also carries gtHelperHasSideEffects and (under FEATURE_READYTORUN) gtEntryPoint. If any of these can differ, treating nodes as equal can lead to incorrect structural comparisons (e.g., in CSE/value numbering or debug asserts). Consider including gtHelperHasSideEffects and the ReadyToRun entrypoint lookup in the comparison (or document why they are irrelevant/invariant).

Copilot uses AI. Check for mistakes.
EgorBo and others added 2 commits March 12, 2026 20:15
- Remove NEWARR_1_VC/PTR inline expansion from objectalloc.cpp and codegen
- Remove constant-size folding at codegen time (no more JIT-time MT reads)
- Remove objectMethodTableOffset field (MT pointer always at offset 0)
- Remove arrayLengthOffset, arrayBaseSize, methodTableComponentSizeOffset
- Remove threadVarsSection (macOS not supported for inline alloc)
- Add tlsRootOffset field for Linux ARM64 (pre-computed tpidr_el0 offset)
- Add genInlineAllocCall for ARM64 in codegenarmarch.cpp
  - Windows: x18 (TEB) + TLS array + index + offset pattern
  - Linux: mrs tpidr_el0 + pre-computed offset (no function call needed)
  - macOS: not supported (set supported=false)
- Add GetRuntimeThreadLocalsVariableOffset assembly stub for ARM64 Linux
- Update CORINFO_OBJECT_ALLOC_CONTEXT_INFO in corinfo.h, CorInfoTypes.cs
- Update SuperPMI agnostic struct and methodcontext rec/dmp/rep
- Update jitinterface.cpp and threadstatics.cpp for new struct layout
- Change TARGET_AMD64 guards to TARGET_AMD64 || TARGET_ARM64

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 12, 2026 21:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants