Skip to content

Eliminate unnecessary vtable slot backpatching during non-final tier transitions#125359

Open
davidwrighton wants to merge 11 commits intodotnet:mainfrom
davidwrighton:investigate_tiering_issue
Open

Eliminate unnecessary vtable slot backpatching during non-final tier transitions#125359
davidwrighton wants to merge 11 commits intodotnet:mainfrom
davidwrighton:investigate_tiering_issue

Conversation

@davidwrighton
Copy link
Member

Summary

This PR eliminates unnecessary vtable slot backpatching during non-final compilation tiers (Tier0, Tier0Instrumented), significantly reducing lock contention and entry point oscillations in the tiered compilation pipeline.

Problem

When PR #124664 (""Eliminate forwarder stubs by reusing method precodes for call counting indirection"") was merged, it was subsequently reverted by #125285 due to a 12-13% regression on the Orchard benchmark. Investigation with CLR logging revealed the root cause:

  • With forwarder stubs eliminated, vtable slots point directly to precodes
  • During non-final tier transitions, SetCodeEntryPoint eagerly backpatched vtable slots via DoBackpatch()
  • Each DoBackpatch() call acquires MethodDescBackpatchInfoTracker lock (RtlEnterCriticalSection)
  • This caused 335 TryBackpatchEntryPointSlots calls and 91 slot writes during test execution
  • Additionally, entry point oscillations occurred (281 backpatch attempts) as methods bounced between code versions

Solution

The fix keeps vtable slots pointing to the precode (temporary entry point) during non-final tiers, deferring backpatching until the final tier:

Non-final tiers (Tier0, Tier0Instrumented):

  • Redirect precode target to compiled code via SetTargetInterlocked() (atomic, no lock needed)
  • Do NOT modify GetMethodEntryPoint() (vtable slot stays at temporary entry point)
  • DoBackpatch() short-circuits when GetMethodEntryPoint() == GetTemporaryEntryPoint(), preventing slot recording

Final tier (Tier1):

  • Set vtable slot to final code via SetMethodEntryPoint()
  • Reset precode to prestub via ResetTargetInterlocked()
  • Subsequent calls through unpatched vtable slots flow through prestub → DoBackpatch(), which lazily discovers, records, and patches each slot

Results

Metric Before (forwarder stubs) After optimization
TryBackpatchEntryPointSlots 335 0
BackpatchSlots 335 0
Slot writes 91 0
Entry point oscillations 281 0

All vtable slot patching now happens lazily at final tier via DoBackpatch().

Changes

Core optimization (commits 3-4)

  • callcounting.cpp: 8 paths modified to use SetBackpatchableEntryPoint helper
  • codeversion.cpp: Initial publish path modified for backpatchable methods
  • tieredcompilation.cpp: Tiering delay activation path modified
  • prestub.cpp: Updated comments describing the new design invariant

Logging (commits 1, 5)

  • Added LOG calls in SetCodeEntryPoint, TryBackpatchEntryPointSlots, Backpatch_Locked (method.cpp, methoddescbackpatchinfo.cpp)
  • These use LF_TIEREDCOMPILATION at LL_INFO10000

Refactoring (commit 6)

  • Extracted MethodDesc::SetBackpatchableEntryPoint() helper consolidating 10 duplicated code paths into one method
  • Net reduction of ~82 lines

Thread Safety

  • Non-final tier paths: Use SetTargetInterlocked() which is atomic — no lock required
  • Final tier paths: Called under MethodDescBackpatchInfoTracker lock (asserted by SetMethodEntryPoint)
  • HasNativeCode(): Independent of GetMethodEntryPoint() — checks NativeCodeVersion, unaffected by keeping vtable slot at temporary entry point

Testing

  • Custom vtable backpatch test with abstract base class, derived types, factory method, and hot loops exercising the full TieredPGO pipeline
  • Verified with CLR logging that the full lifecycle is visible (non-final redirects + final tier patching)

davidwrighton and others added 8 commits March 9, 2026 14:22
Add a new Copilot skill that guides running any coreclr or libraries
test with the CoreCLR diagnostic logging subsystem enabled. The skill
documents all available log facilities (LF_*), extended facilities
(LF2_*), log levels, and output control environment variables, and
provides step-by-step instructions for capturing log files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add logging to the vtable slot backpatching code path so that each
entry point update is visible in CLR logs when LF_TIEREDCOMPILATION
is enabled. The new LOG calls cover:

- MethodDesc::SetCodeEntryPoint: logs which path is taken
  (BackpatchSlots, Precode, StableEntryPoint)
- MethodDesc::TryBackpatchEntryPointSlots: logs entry point
  transitions and early-exit reasons
- MethodDescBackpatchInfoTracker::Backpatch_Locked: logs the method
  and count of slots backpatched
- EntryPointSlots::Backpatch_Locked: logs each individual slot write
  with slot type (Normal, Vtable, Executable, ExecutableRel32)

All logging uses LF_TIEREDCOMPILATION at LL_INFO10000, matching
existing tiered compilation log patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For backpatchable methods, avoid calling SetCodeEntryPoint (which
eagerly backpatches all recorded vtable slots) during non-final tier
transitions. Instead, use SetMethodEntryPoint to update the method
entry point and SetTargetInterlocked to redirect the precode to the
new code, keeping vtable slots pointing to the precode.

For final tier activation, reset the precode to prestub so that calls
through vtable slots trigger DoBackpatch() for lazy slot discovery,
recording, and patching to the final tier code.

This eliminates the PreStub lock contention that caused a 12-13%
regression in PR dotnet#124664, while preserving the forwarder stub
elimination benefits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
During non-final tiers, keep GetMethodEntryPoint() at the temporary
entry point (precode) instead of setting it to native code. This causes
DoBackpatch() to short-circuit early, preventing vtable slot recording
and patching during non-final tier transitions. The precode target is
still redirected to native code or call counting stubs via
SetTargetInterlocked, so calls through the precode work correctly.

When the final tier is activated, SetMethodEntryPoint() is set to the
final code and the precode is reset to prestub. The next call through
a vtable slot triggers DoBackpatch(), which records and patches the slot
to point directly to the final tier code. Subsequent calls bypass the
precode entirely.

Key changes:
- PublishVersionedMethodCode: skip TrySetInitialCodeEntryPointForVersionableMethod
  for non-final tiers with call counting; just redirect the precode target
- CallCountingManager::SetCodeEntryPoint: remove SetMethodEntryPoint and
  BackpatchToResetEntryPointSlots calls for non-final tier backpatchable methods
- TrySetCodeEntryPointAndRecordMethodForCallCounting: remove SetMethodEntryPoint
- CompleteCallCounting: remove SetMethodEntryPoint for non-final tiers
- DoBackpatch comments: updated to describe the new invariant

Results (TryBackpatchEntryPointSlots / BackpatchSlots / slot writes):
- Baseline (forwarder stubs):    335 / 91 / -
- Previous optimization:         281 / 0  / 60
- After this change:             0   / 0  / 0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The existing LOG calls in MethodDesc::SetCodeEntryPoint only cover
non-backpatchable methods. Backpatchable methods bypass SetCodeEntryPoint
and go through dedicated paths in CallCountingManager::SetCodeEntryPoint,
CompleteCallCounting, PublishVersionableCodeIfNecessary, and
TrySetCodeEntryPointAndRecordMethodForCallCounting. Without logging in
these paths, the vtable slot lifecycle for backpatchable methods is
invisible in checked build logs.

Add LOG calls at all 10 vtable slot backpatch paths with descriptive
labels: InitialPublish-NonFinal, CallCountingStub, ThresholdReached,
NonFinal, FinalTier, FinalTier-NonDefault, SameVersion-NonFinal,
DiffVersion-FinalTier, DiffVersion-NonFinal, and DelayActivation-NonFinal.

All use LF_TIEREDCOMPILATION at LL_INFO10000, consistent with the
existing LOG calls in method.cpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aths

Consolidate 10 duplicated vtable slot backpatch code patterns across
callcounting.cpp, codeversion.cpp, and tieredcompilation.cpp into a
single MethodDesc::SetBackpatchableEntryPoint helper method.

The helper encapsulates the two core patterns:
- Final tier: SetMethodEntryPoint(code) + ResetTargetInterlocked()
- Non-final tier: SetTargetInterlocked(target, fOnlyRedirectFromPrestub)

Includes centralized LOG calls and an assert to prevent invalid
parameter combinations (isFinalTier + fOnlyRedirectFromPrestub).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…_issue

# Conflicts:
#	src/coreclr/vm/callcounting.cpp
Copilot AI review requested due to automatic review settings March 10, 2026 03:11
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes tiered compilation and call counting for backpatchable (vtable-slot) methods to avoid eagerly backpatching vtable slots during non-final tiers, deferring slot patching to the final tier to reduce lock contention and entry point oscillations.

Changes:

  • Introduces MethodDesc::SetBackpatchableEntryPoint() and updates tiering/call-counting/version-publish paths to redirect the temporary-entrypoint precode during non-final tiers instead of backpatching vtable slots.
  • Updates the final-tier activation path for backpatchable methods to set the owning vtable slot to final code and reset the precode to prestub to enable lazy DoBackpatch() patching.
  • Adds additional tiered-compilation logging and a new “coreclr-logging-test” skill doc for capturing CoreCLR logs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/vm/tieredcompilation.cpp Uses SetBackpatchableEntryPoint for backpatchable methods when recording for call counting during tiering delay.
src/coreclr/vm/prestub.cpp Updates DoBackpatch() commentary to describe the new non-final vs final tier behavior/invariant.
src/coreclr/vm/methoddescbackpatchinfo.cpp Adds high-verbosity logging for slot backpatch operations and counts patched slots.
src/coreclr/vm/method.hpp Declares MethodDesc::SetBackpatchableEntryPoint().
src/coreclr/vm/method.cpp Implements SetBackpatchableEntryPoint() and adds logging in entrypoint/backpatch code paths.
src/coreclr/vm/codeversion.cpp Updates initial publish path to redirect precode (no slot backpatch) for call-counting-first-call + backpatchable methods.
src/coreclr/vm/callcounting.h Updates call counting design documentation to reflect precode-based indirection for backpatchable methods.
src/coreclr/vm/callcounting.cpp Removes forwarder-stub tracking and switches multiple paths to use SetBackpatchableEntryPoint().
.github/skills/coreclr-logging-test/SKILL.md Adds documentation for running tests with CLR logging enabled and collecting logs.

Comment on lines +22 to +31
- For tiered methods that do not normally use a MethodDesc precode as the stable entry point (virtual and interface methods
when slot backpatching is enabled), the method's own temporary-entrypoint precode is redirected to the call counting stub,
and vtable slots are reset to point to the temporary entry point. This ensures calls flow through temporary-entrypoint
precode -> call counting stub -> native code, and the call counting stub can be safely deleted since vtable slots don't
point to it directly. GetMethodEntryPoint() is kept at the native code entry point (not the temporary entry point) so that
DoBackpatch() can still record new vtable slots after the precode reverts to prestub. During call counting, there is a
bounded window where new vtable slots may not be recorded because the precode target is the call counting stub rather than
the prestub. These slots are corrected once call counting stubs are deleted and the precode reverts to prestub.
When call counting ends, the precode is always reset to prestub (never to native code), preserving the invariant
that new vtable slots can be discovered and recorded by DoBackpatch().
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated call counting overview still states that for backpatchable methods GetMethodEntryPoint() is kept at the native-code entry point and that vtable slots are reset when call counting starts. With the new non-final-tier design (vtable slots kept at the temporary entry point, and slot recording/backpatching deferred until final tier), this description appears inconsistent with the current behavior and the updated invariants described in prestub.cpp. Please update this comment block to reflect the new scheme (temporary entry point stays as the method entry point during non-final tiers; precode target is redirected; backpatching/slot discovery happens lazily at final tier).

Copilot uses AI. Check for mistakes.
…pilation lifecycle

Funcptr precodes for backpatchable methods are now registered in the entry point
slot backpatching table at creation time, replacing the ad-hoc funcptr precode
lookup and patching that was previously done in TryBackpatchEntryPointSlots.

This ensures:
- During non-final tiers, funcptr precode targets point to the method's precode
  (temporary entry point), so calls flow through the same path as vtable calls
- At final tier, SetBackpatchableEntryPoint calls Backpatch_Locked to update all
  registered slots (including funcptr precodes) to the final code
- During rejit, BackpatchToResetEntryPointSlots resets funcptr targets via the
  backpatching table, ensuring proper re-discovery through the prestub

The change adds Precode::GetTargetSlot() and StubPrecode::GetTargetSlot() to
expose the writable target field address for registration as a SlotType_Normal
entry point slot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 00:27
@davidwrighton davidwrighton force-pushed the investigate_tiering_issue branch from 284fd87 to 0c178ff Compare March 11, 2026 00:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/coreclr/vm/method.cpp:3195

  • TryBackpatchEntryPointSlots was changed from FORCEINLINE to out-of-line. This method is on the tiered compilation/backpatching path and may be performance-sensitive; consider keeping it FORCEINLINE in non-logging builds (e.g., guard logging with #ifdef LOGGING) to avoid an unintended regression.
bool MethodDesc::TryBackpatchEntryPointSlots(
    PCODE entryPoint,
    bool isPrestubEntryPoint,
    bool onlyFromPrestubEntryPoint)
{

The overview comment in callcounting.h described the old behavior where
GetMethodEntryPoint() was kept at native code and vtable slots were
actively reset during call counting. Updated to match the current design
where GetMethodEntryPoint() stays at the temporary entry point during
non-final tiers (causing DoBackpatch to short-circuit), and lazy slot
discovery/recording happens only at final tier.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 20, 2026 21:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment on lines 1853 to +1863
bool mayHaveEntryPointSlotsToBackpatch2 = pMethodDesc->MayHaveEntryPointSlotsToBackpatch();
MethodDescBackpatchInfoTracker::ConditionalLockHolder slotBackpatchLockHolder2(
mayHaveEntryPointSlotsToBackpatch2);
pMethodDesc->TrySetInitialCodeEntryPointForVersionableMethod(pCode, mayHaveEntryPointSlotsToBackpatch2);
if (handleCallCountingForFirstCall && mayHaveEntryPointSlotsToBackpatch2)
{
pMethodDesc->SetBackpatchableEntryPoint(pCode, false /* isFinalTier */, TRUE /* fOnlyRedirectFromPrestub */);
}
else
{
MethodDescBackpatchInfoTracker::ConditionalLockHolder slotBackpatchLockHolder2(
mayHaveEntryPointSlotsToBackpatch2);
pMethodDesc->TrySetInitialCodeEntryPointForVersionableMethod(pCode, mayHaveEntryPointSlotsToBackpatch2);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeVersionManager::PublishVersionableCodeIfNecessary now calls MethodDesc::SetBackpatchableEntryPoint(...), but that method is only declared under #ifdef FEATURE_TIERED_COMPILATION (see method.hpp). In builds where FEATURE_TIERED_COMPILATION is disabled, this call will not compile even though the runtime logic keeps handleCallCountingForFirstCall false. Consider wrapping this call in #ifdef FEATURE_TIERED_COMPILATION (or using the existing TrySetInitialCodeEntryPointForVersionableMethod path when tiering is unavailable).

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +152
// Promote methods to Tier1 via hot loop, then wait for background compilation
Console.WriteLine("=== Promoting methods to Tier1 via hot loop ===");
SuppressOutput = true;
for (int i = 0; i < 200; i++)
{
TriggerDirectInlining();
CallMethodWithoutInlining();
TriggerInliningChain();
}
SuppressOutput = false;
Thread.Sleep(500);

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tier-promotion wait uses a fixed Thread.Sleep(500). On slower/loaded CI machines Tier1 background compilation may not complete within 500ms, making this test prone to flakiness. Consider polling with a longer overall timeout (sleeping in small increments and proceeding once Tier1 is observed / the expected behavior occurs), or increasing the wait time and making it configurable via an env var.

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +276
// Run the tiered rejit test (TC enabled, with CLR logging)
string logFile = Path.Combine(Path.GetTempPath(), "rejit_tiering.log");
Console.WriteLine($"CLR tiering log file: {logFile}");
result = ProfilerTestRunner.Run(profileePath: System.Reflection.Assembly.GetExecutingAssembly().Location,
testName: "ReJITWithTiering",
profilerClsid: ReJitProfilerGuid,
profileeArguments: "Tiered",
envVars: new Dictionary<string, string>
{
{ "DOTNET_TieredCompilation", "1" },
{ "DOTNET_REJIT_TIERED_MODE", "1" },
{ "DOTNET_LogEnable", "1" },
{ "DOTNET_LogFacility", "0x00400000" },
{ "DOTNET_LogLevel", "7" },
{ "DOTNET_LogToFile", "1" },
{ "DOTNET_LogFile", logFile }
});
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLR log file path is hard-coded to a fixed temp filename (rejit_tiering.log). When tests run concurrently on the same machine, multiple runs can race to overwrite the same file (or fail to open it, depending on sharing mode), and it also leaves behind a predictable temp artifact. Consider generating a unique filename (e.g., include PID/GUID) and/or setting DOTNET_LogWithPid=1 (and optionally deleting the file at the end of the run if it’s only for debugging).

Copilot uses AI. Check for mistakes.
Extend the existing profiler rejit test to also exercise ReJIT with
tiered compilation enabled. The new test runs two phases:

Phase 1 (Non-final tier): Methods are at Tier0 when rejit fires.
Verifies that the profiler can replace the function body and that
revert restores the original code.

Phase 2 (Final tier): Methods are promoted to Tier1 via a hot loop,
then rejit fires again. Verifies function body replacement works
correctly when the target method has already been optimized.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@davidwrighton davidwrighton force-pushed the investigate_tiering_issue branch from 8301603 to 357b6c3 Compare March 20, 2026 21:34
@davidwrighton davidwrighton marked this pull request as ready for review March 20, 2026 21:36
Copilot AI review requested due to automatic review settings March 20, 2026 21:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment on lines 3339 to 3341
GetPrecode()->ResetTargetInterlocked();
}
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResetCodeEntryPoint() resets the precode target for versionable-with-precode methods (see GetPrecode()->ResetTargetInterlocked() here), but for backpatchable methods it returns immediately after BackpatchToResetEntryPointSlots() without resetting the temporary-entrypoint precode’s target back to the prestub. With this PR, non-final tiers can redirect that precode to native code / call-counting stubs, so a later reset (e.g., ReJIT activating an uncompiled version) could leave the precode pointing at stale code and prevent calls from flowing through the prestub for compilation/backpatch discovery. Consider also resetting the temporary-entrypoint precode target to prestub in the backpatchable branch.

Copilot uses AI. Check for mistakes.
envVars: new Dictionary<string, string>
{
{ "DOTNET_TieredCompilation", "1" },
{ "DOTNET_REJIT_TIERED_MODE", "1" }
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiered ReJIT run inherits the current process environment (ProfilerTestRunner copies all env vars), but this run only forces DOTNET_TieredCompilation=1. If the test runs under configurations that set DOTNET_JitStress and/or DOTNET_JITMinOpts, tiering/promotion behavior can become nondeterministic and the Tier1 phase may not behave as expected. Consider explicitly setting DOTNET_JitStress=0 and DOTNET_JITMinOpts=0 (and any other relevant jitstress knobs) in this envVars dictionary to stabilize the tiered test while still leaving tiered compilation enabled.

Suggested change
{ "DOTNET_REJIT_TIERED_MODE", "1" }
{ "DOTNET_REJIT_TIERED_MODE", "1" },
{ "DOTNET_JitStress", "0" },
{ "DOTNET_JITMinOpts", "0" }

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants