Skip to content

Conversation

@davidwrighton
Copy link
Member

We have logic which loads a typedef, and if the typedef is loaded then we assume the method can be loaded via lookup in the MethodDef/FieldDef token map tables. HOWEVER, what this boils down to is we do a load from the TypeDef table, and if that succeeds we will do a load from the FieldDef/MethodDef token map tables, but that second load is done without an explicit memory barrier. Unfortunately, through the wonder of memory ordering behavior, its legal (and not actually all that uncommon) for the CPU to do the load from the FieldDef/MethodDef tables BEFORE it actually loads from the TypeDef table. So what can happen is that the fielddef table read can happen, then the typedef table read can happen, then the CPU does the if check to see if its ok to do the fielddef table read, then we use the value from the fielddef table read which happened earlier.

This fix should fix that problem by inserting a memory barrier if there is ever a read of NULL from either of those tables, and retrying. Reads of NULL are significantly rarer than reads with values filled in, so this fix should not cause any meaningful performance issues.

Fixes #120754

…ef token tables

We have logic which loads a typedef, and if the typedef is loaded then we assume the method can be loaded via lookup in the MethodDef/FieldDef token map tables. HOWEVER, what this boils down to is we do a load from the TypeDef table, and if that succeeds we will do a load from the FieldDef/MethodDef token map tables, but that second load is done without an explicit memory barrier. Unfortunately, through the wonder of memory ordering behavior, its legal (and not actually all that uncommon) for the CPU to do the load from the FieldDef/MethodDef tables BEFORE it actually loads from the TypeDef table. So what can happen is that the fielddef table read can happen, then the typedef table read can happen, then the CPU does the if check to see if its ok to do the fielddef table read, then we use the value from the fielddef table read which happened earlier.

This fix should fix that problem by inserting a memory barrier if there is ever a read of NULL from either of those tables, and retrying. Reads of NULL are significantly rarer than reads with values filled in, so this fix should not cause any meaningful performance issues.

Fixes dotnet#120754
Copilot AI review requested due to automatic review settings January 23, 2026 19:18
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical memory ordering issue in the MethodDef and FieldDef token lookup functions that could cause sporadic MissingMethodException errors. The root cause is that these lookup maps are populated before the TypeDef table entry is stored, and due to CPU memory reordering, a read from these maps could happen before the TypeDef check, causing a null value to be used even when the entry was actually populated.

Changes:

  • Added memory barrier protection in LookupMethodDef to prevent CPU reordering when NULL is read
  • Moved LookupFieldDef from header/cpp to inline file and added the same memory barrier protection
  • Unified DAC and non-DAC implementations of LookupFieldDef into a single implementation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/coreclr/vm/ceeload.inl Added NULL-check with memory barrier retry logic to LookupMethodDef; moved LookupFieldDef implementation here with same memory barrier fix
src/coreclr/vm/ceeload.h Removed conditional compilation guards and inline implementation of LookupFieldDef, keeping only forward declaration
src/coreclr/vm/ceeload.cpp Removed DACCESS_COMPILE-specific LookupFieldDef implementation

@davidwrighton davidwrighton added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 23, 2026
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

// dependency instead of an explicit barrier here. However, the access pattern between
// m_TypeDefToMethodTableMap and m_MethodDefToDescMap/m_FieldDefToDescMap is such that a
// data dependency is not sufficient to ensure that the MethodTable is visible when we
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
// access the MethodDesc/FieldDesc. Since those loads are independent, we use

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines +54 to +60
// LookupMap's hold pointers to data which is immutable, so normally we could use a data
// dependency instead of an explicit barrier here. However, the access pattern between
// m_TypeDefToMethodTableMap and m_MethodDefToDescMap/m_FieldDefToDescMap is such that a
// data dependency is not sufficient to ensure that the MethodTable is visible when we
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
// VolatileLoad here to ensure proper ordering.
TADDR value = VolatileLoad(pValue);
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix only applies to the LookupMap<SIZE_T> template specialization, but the maps mentioned in the comment (m_TypeDefToMethodTableMap, m_MethodDefToDescMap, and m_FieldDefToDescMap) are actually LookupMap<PTR_MethodTable>, LookupMap<PTR_MethodDesc>, and LookupMap<PTR_FieldDesc> respectively. These use the generic LookupMap template at line 16, which still uses VolatileLoadWithoutBarrier. The memory ordering fix needs to be applied to the generic template as well, otherwise the race condition described in the PR is not actually fixed.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +59
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
// VolatileLoad here to ensure proper ordering.
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "Since those loads are independent." is incomplete. It should be connected to the previous sentence or completed with a clause explaining the consequence. Consider: "However, the access pattern between m_TypeDefToMethodTableMap and m_MethodDefToDescMap/m_FieldDefToDescMap is such that a data dependency is not sufficient to ensure that the MethodTable is visible when we access the MethodDesc/FieldDesc, since those loads are independent."

Suggested change
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
// VolatileLoad here to ensure proper ordering.
// access the MethodDesc/FieldDesc, because those loads are independent. Therefore, we
// use VolatileLoad here to ensure proper ordering.

Copilot uses AI. Check for mistakes.
// data dependency is not sufficient to ensure that the MethodTable is visible when we
// access the MethodDesc/FieldDesc. Since those loads are independent. So we use
// VolatileLoad here to ensure proper ordering.
TADDR value = VolatileLoad(pValue);
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states the fix will insert "a memory barrier if there is ever a read of NULL from either of those tables, and retrying", but the actual implementation unconditionally adds a memory barrier via VolatileLoad without any NULL check or retry logic. This discrepancy suggests either the PR description is inaccurate or the implementation is incomplete.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants