Skip to content

[Shift-Left] Auto-detect many typedefs and manual remaps by walking the AST before PInvokeGenerator runs#2240

Draft
jevansaks wants to merge 12 commits intomainfrom
user/jevansa/win32metadata-scraper
Draft

[Shift-Left] Auto-detect many typedefs and manual remaps by walking the AST before PInvokeGenerator runs#2240
jevansaks wants to merge 12 commits intomainfrom
user/jevansa/win32metadata-scraper

Conversation

@jevansaks
Copy link
Copy Markdown
Member

@jevansaks jevansaks commented Mar 31, 2026

Summary

Adds Win32MetadataScraper — a new tool that hosts ClangSharp.PInvokeGenerator v17.0.1 as a library and auto-discovers typedef-tag remappings from the AST in a single parse pass.

How it works

  1. Parses the translation unit using our own CXIndex (same Clang parse as stock CLI)
  2. Walks the AST to find TypedefDeclTagType pairs (e.g., _BLUETOOTH_ADDRESSBLUETOOTH_ADDRESS)
  3. Filters through an exclusion list (remapExclusions.rsp) and heuristic rules
  4. Merges auto-discovered remaps with manual --remap entries (manual wins on conflict)
  5. Creates PInvokeGenerator with the full merged config
  6. Calls GenerateBindings on the same TranslationUnit — single parse, single pass
  7. Writes auto-remaps to a .remaps sidecar file collected by ScrapeHeaders

Per ClangSharp owner guidance: the translation unit can be walked freely before passing to GenerateBindings. No reflection or private field access needed.

Key design points

  • NuGet library references — ClangSharp/Interop/PInvokeGenerator 17.0.1, type-safe API
  • Native libs from tool storelibclang.dll/libClangSharp.dll loaded from the already-installed dotnet tool
  • Graceful fallback — if Win32MetadataScraper.dll not found, ScrapeHeaders uses stock ClangSharpPInvokeGenerator
  • Exclusion list — ~95 types that must never be auto-remapped (IUnknown, LARGE_INTEGER→long, opaque handles, etc.)
  • Auto-remaps RSP — included in @(ScraperRsp) before manual settings so manual overrides win

Verification

Check Result
Winmd size 24,328,704 bytes ✅ (matches baseline)
MetadataUtils.Tests 17/17 ✅
Windows.Win32.Tests 13/13 ✅
ClangSharpSourceToWinmdTests 6/6 ✅
Auto-discovered remaps ~12,000 entries

Files changed

File Change
sources/Win32MetadataScraper/Program.cs New: single-pass scraper tool
sources/Win32MetadataScraper/Win32MetadataScraper.csproj New: references ClangSharp 17.0.1 NuGet packages
sources/GeneratorSdk/tools/assets/scraper/remapExclusions.rsp New: exclusion list
sources/GeneratorSdk/MetadataTasks/ScrapeHeaders.cs Modified: launches scraper, reads .remaps, writes auto RSP
sources/GeneratorSdk/sdk/sdk.targets Modified: includes auto-remaps RSP
BuildTools/BuildTools.proj Modified: adds Win32MetadataScraper to build

jevansaks and others added 3 commits March 31, 2026 10:35
Hosts ClangSharp.PInvokeGenerator v17.0.1 as a library (NuGet package
references). Parses the translation unit, walks the AST to discover
typedef-tag remappings, merges them with manual --remap entries, then
runs PInvokeGenerator.GenerateBindings on the same TranslationUnit in
a single parse pass.

- Win32MetadataScraper: new console app tool
- remapExclusions.rsp: exclusion list for unsafe auto-remaps
- ScrapeHeaders.cs: launches scraper, collects .remaps, writes auto RSP
- sdk.targets: includes auto-remaps RSP in @(ScraperRsp)
- Graceful fallback to stock ClangSharpPInvokeGenerator if scraper not found

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… ptr fixups

- Rewrite AST walker to collect ALL typedefs per tag, disambiguate
  multi-typedef tags automatically (no exclusion list needed)
- Add UnwrapType helper for ElaboratedType/AttributedType/ParenType
  sugar types that were hiding typedef-tag and fn-ptr relationships
- Auto-discover function pointer typedef pairs from AST: detect
  FunctionProtoType typedefs and their PointerType aliases, generate
  --remap, --exclude, and --reducePointerLevel entries automatically
- Remove 12,197 of 12,705 manual --remap entries from scraper.settings.rsp
  (508 genuinely manual entries remain: semantic primitives, pointer
  remaps, nested qualifieds, uppercase renames)
- Remove 160 of 184 entries from functionPointerFixups.json
  (24 remain: 6 alreadyPointer, 13 name-only, 5 edge cases)
- Delete remapExclusions.rsp (no longer needed)
- Zero conflicts: auto-discovery never picks wrong name
- Winmd output identical to baseline (24,328,704 bytes)
- All 36 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract WalkTranslationUnit, ResolveTagRemaps, FilterTagRemaps,
ResolveFunctionPointerFixups, and UnwrapType from Program.cs into a
public RemapDiscovery class that can be tested independently.

Add Win32MetadataScraperTests project with 22 xUnit tests covering:
- Tag-typedef discovery: _Uppercase, _lowercase, tag prefix, lowercase
  tag, trailing underscore, enum typedef patterns
- Multi-typedef disambiguation: stripped prefix match, ambiguous skip,
  manual hint resolution
- Semantic override protection: manual remap takes priority
- Built-in type filtering: _GUID excluded
- Identity remap filtering: same tag/typedef skipped
- Function pointer discovery: two-step LP/P/PFN patterns, direct fn
  ptr, calling convention (AttributedType unwrap), non-standard naming
  skip, typedef alias pattern, ambiguous pointer targets skip
- UnwrapType: ElaboratedType and AttributedType sugar peeling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jevansaks jevansaks changed the title Add Win32MetadataScraper: single-pass auto-remap discovery Shift-Left Metadata: Auto-detect many typedefs and manual remaps by walking the AST before PInvokeGenerator runs Apr 1, 2026
@jevansaks jevansaks changed the title Shift-Left Metadata: Auto-detect many typedefs and manual remaps by walking the AST before PInvokeGenerator runs [Shift-Left] Auto-detect many typedefs and manual remaps by walking the AST before PInvokeGenerator runs Apr 1, 2026
@jevansaks jevansaks force-pushed the user/jevansa/win32metadata-scraper branch 3 times, most recently from b5692ed to 2a5e7cc Compare April 2, 2026 00:15
… direction, simplified consistency check

Heuristic fixes:
- Add case-insensitive match to multi-typedef disambiguator (fixes in6_addr→IN6_ADDR
  when competing with IPv6Addr typedef)
- Fix fn ptr alreadyPointer direction: when proto has P/LP prefix and alias doesn't,
  remap alias→proto (e.g., EXCEPTION_ROUTINE→PEXCEPTION_ROUTINE, not reverse)
- Filter suffix-adding remaps (e.g., GLUnurbs→GLUnurbsObj) since tag is canonical
- Skip C++ namespace-qualified tags (Gdiplus::Status, ABI::*)

Cross-partition consistency:
- Simplified CheckCrossPartitionRemapConsistency: scans declared type names in
  generated .cs files and checks against discovered remap tags. No regex needed.
- Warning-only (PInvokeGenerator handles these internally)
- Added 7 partition #include fixes for cross-partition typedef visibility
- 5 irreducible manual remaps for types where AST walker can't discover the typedef
  in certain partition configurations

Tests: 30 scraper tests (7 new), all 60 pass. Winmd 24,328,704 bytes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jevansaks jevansaks force-pushed the user/jevansa/win32metadata-scraper branch from 2a5e7cc to f5e51c6 Compare April 2, 2026 03:56
jevansaks and others added 8 commits April 2, 2026 09:25
- Remove unused FilterRemaps() method from Program.cs (duplicated RemapDiscovery.FilterTagRemaps)
- Tighten HasPointerPrefix to require uppercase after single P prefix
- Change CheckCrossPartitionRemapConsistency to void (always returned true)
- Fix trailing space in LSA_GET_EXTENDED_CALL_FLAGS in functionPointerFixups.json
- Add plan docs from win32metadata repo (auto-type-remappings, shift-left, annotations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore key manual remaps, tighten auto-discovery heuristics, and capture the remaining rename-drift findings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant