Skip to content

Conversation

@michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Oct 3, 2025

It turns out that the C# extractor does some symmetric location extraction, which leads to bloated TRAP and subsequently also impacts TRAP import performance, as the importer needs to do tuple de-duplication.

The issue can best be explained by a partial class example. That is, assume we have a partial class declaration C scattered in two files file1.cs and file2.cs.

<file1.cs>
public partial class C { }

<file2.cs>
public partial class C { }

The extractor will encounter the type C twice during extraction and extract its location(s). However, prior to this change the extractor, extracts all the locations of C on each encounter (and then uses TRAP import de-duplication to "solve" the problem). Instead we could rely on the symmetry and only extract the location from the current context (the context is given by the file currently being extracted). As a consequence we only want to extract the location in file1.cs when processing file1.cs and the location in file2.cs when extracting file2.cs.
At first glance, this might look like a corner-case - and for traced extraction this is also the case. However, for buildless (BMN) extraction we cram all code from a repository into a single compilation, which can cause significant overlaps in the locations reported by Roslyn. The larger the repository, the higher the chance for overlaps and the higher the impact we get from this change.

Furthermore, this is a pre-requisite, if we want to switch to using * IDs for location extraction.

The last DCA executions shows that
For traced extraction

  • Performance and results are stable.

For buildless extraction.

  • Analysis time and results are stable.
  • Reduction in TRAP for all projects, but for mono/mono we get a reduction of around 30%.
  • E2E time for mono/mono is reduced by 30% (and 12% for dotnet/roslyn) and stable (within the statistical uncertainty) for the remaining DCA projects.

@github-actions github-actions bot added the C# label Oct 3, 2025
@michaelnebel michaelnebel force-pushed the csharp/reducetyplocationtuples branch from 6cb2561 to 6149608 Compare October 6, 2025 14:41
@michaelnebel michaelnebel marked this pull request as ready for review October 7, 2025 12:25
@michaelnebel michaelnebel requested a review from a team as a code owner October 7, 2025 12:25
Copilot AI review requested due to automatic review settings October 7, 2025 12:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes location extraction for named types by reducing redundant location tuple extraction. Previously, when a partial class was declared across multiple files, the extractor would extract all locations for each encounter, leading to bloated TRAP data. Now, location extraction respects the current context, only extracting locations relevant to the file being processed.

  • Improved location extraction efficiency by filtering source locations to the current context
  • Added context-aware location filtering for type parameters and constructors
  • Enhanced test coverage for multi-file partial classes and location handling

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
csharp/extractor/Semmle.Extraction.CSharp/Extractor/Context.cs Added IsLocationInContext method to check if location belongs to current source tree
csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs Modified GetLocations to filter source locations by context and made method non-static
csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/TypeParameter.cs Added context check before extracting type parameter locations
csharp/extractor/Semmle.Extraction.CSharp/Entities/Constructor.cs Improved implicit constructor location reporting with source location filtering
csharp/ql/lib/semmle/code/csharp/Type.qll Added location override for nested types to use unbound declaration locations
csharp/ql/test/library-tests/partial/*.cs Added test files for partial classes across multiple files
csharp/ql/test/library-tests/locations/*.cs Added test files for location handling with inheritance and partial classes
Various .expected files Updated test expectations to reflect improved location extraction

@michaelnebel michaelnebel requested a review from hvitved October 7, 2025 12:25
@michaelnebel michaelnebel changed the title C#: Further reduce location tuple extraction. C#: Reduce location tuple extraction for named types. Oct 7, 2025
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Friday Win 🎉

@michaelnebel michaelnebel merged commit ea4d475 into github:main Oct 7, 2025
25 checks passed
@michaelnebel michaelnebel deleted the csharp/reducetyplocationtuples branch October 7, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants