-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C#: Reduce location tuple extraction for named types. #20581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C#: Reduce location tuple extraction for named types. #20581
Conversation
…ting location (if any).
…plicit constructor.
6cb2561 to
6149608
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes location extraction for named types by reducing redundant location tuple extraction. Previously, when a partial class was declared across multiple files, the extractor would extract all locations for each encounter, leading to bloated TRAP data. Now, location extraction respects the current context, only extracting locations relevant to the file being processed.
- Improved location extraction efficiency by filtering source locations to the current context
- Added context-aware location filtering for type parameters and constructors
- Enhanced test coverage for multi-file partial classes and location handling
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| csharp/extractor/Semmle.Extraction.CSharp/Extractor/Context.cs | Added IsLocationInContext method to check if location belongs to current source tree |
| csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs | Modified GetLocations to filter source locations by context and made method non-static |
| csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/TypeParameter.cs | Added context check before extracting type parameter locations |
| csharp/extractor/Semmle.Extraction.CSharp/Entities/Constructor.cs | Improved implicit constructor location reporting with source location filtering |
| csharp/ql/lib/semmle/code/csharp/Type.qll | Added location override for nested types to use unbound declaration locations |
| csharp/ql/test/library-tests/partial/*.cs | Added test files for partial classes across multiple files |
| csharp/ql/test/library-tests/locations/*.cs | Added test files for location handling with inheritance and partial classes |
| Various .expected files | Updated test expectations to reflect improved location extraction |
hvitved
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Friday Win 🎉
It turns out that the C# extractor does some symmetric location extraction, which leads to bloated TRAP and subsequently also impacts TRAP import performance, as the importer needs to do tuple de-duplication.
The issue can best be explained by a partial class example. That is, assume we have a partial class declaration
Cscattered in two filesfile1.csandfile2.cs.The extractor will encounter the type
Ctwice during extraction and extract its location(s). However, prior to this change the extractor, extracts all the locations ofCon each encounter (and then uses TRAP import de-duplication to "solve" the problem). Instead we could rely on the symmetry and only extract the location from the current context (the context is given by the file currently being extracted). As a consequence we only want to extract the location infile1.cswhen processingfile1.csand the location infile2.cswhen extractingfile2.cs.At first glance, this might look like a corner-case - and for traced extraction this is also the case. However, for buildless (BMN) extraction we cram all code from a repository into a single compilation, which can cause significant overlaps in the locations reported by Roslyn. The larger the repository, the higher the chance for overlaps and the higher the impact we get from this change.
Furthermore, this is a pre-requisite, if we want to switch to using
*IDs for location extraction.The last DCA executions shows that
For traced extraction
For buildless extraction.
mono/monowe get a reduction of around 30%.mono/monois reduced by 30% (and 12% fordotnet/roslyn) and stable (within the statistical uncertainty) for the remaining DCA projects.