Qualify nested-class references when lowering pyspec class fields#1143
Closed
tautschnig wants to merge 4 commits into
Closed
Qualify nested-class references when lowering pyspec class fields#1143tautschnig wants to merge 4 commits into
tautschnig wants to merge 4 commits into
Conversation
Some PySpec producers emit references to a class's nested class using the 'typeClassNoArgs' DDM grammar operator, which carries only a bare identifier with no module path. On round-trip through 'DDM.SpecType.fromDDM', the reconstructed 'PythonIdent' has an empty 'pythonModule', so 'specTypeToLaurelType' renders the reference with its bare name rather than the qualified path under which the nested class is emitted. When multiple sibling classes each declare a nested class of the same bare name, the resulting references collide, are filtered out by 'filterPrelude' as dangling (the nested class itself lives under a qualified name), and the Laurel-to-Core translator bails out with 'Resolution failed: <name> is not defined'. Fix at the PySpec-to-Laurel lowering site: * Add 'SpecType.mapIdentNames' for rewriting the PythonIdent on each 'SpecIdent' within a SpecType. * In 'classDefToLaurel', before lowering each field, rewrite any bare ident whose name matches one of the current class's 'subclasses' to carry the parent's prefixed name as its 'pythonModule'. After rewriting, 'PythonIdent.toLaurelName' produces, for an 'Outer' class with a nested 'Inner', the name 'Outer_Inner', which matches the name 'classDefToLaurel' assigns to the nested class. * When recursing into nested classes, push the parent's prefixed name as the module prefix so the nested class's own emitted name aligns with how its parent references it. Add regression tests in 'StrataTest/Languages/Python/ToLaurelTest.lean' covering: a single outer class with a nested-class field reference, two sibling outer classes each declaring a nested class of the same bare name (no collision), and a bare reference to an unrelated top-level class (left unqualified). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a name-resolution failure in the PySpec→Laurel lowering pipeline when class field types refer to nested classes using a bare identifier (empty pythonModule), which can collide across sibling classes after a DDM round-trip.
Changes:
- Add
SpecType.mapIdentNamesto rewritePythonIdents inside aSpecType. - In
classDefToLaurel, qualify bare field-type references that match a nested class name and recurse into nested classes with an updated module prefix. - Add regression tests covering nested-class field references and sibling nested-class name collisions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| Strata/Languages/Python/Specs/Decls.lean | Adds SpecType.mapIdentNames helper used for identifier rewriting. |
| Strata/Languages/Python/Specs/ToLaurel.lean | Qualifies nested-class references in field types and updates module prefix when lowering nested classes. |
| StrataTest/Languages/Python/ToLaurelTest.lean | Adds regression tests for nested-class field reference qualification and collision avoidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The `let enclosingPrefix := (← read).modulePrefix` binding and the accompanying `let _ := enclosingPrefix` no-op are left over from an earlier iteration that tried to save and restore the prefix manually. `withReader` already scopes the prefix change to the recursive call, so both lines and the misleading "restore happens automatically" comment are redundant. Pure cleanup; no behavior change. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
`typeAliases` is a user-visible map from unprefixed class names to their prefixed Laurel names; `PySpecPipeline` later uses it to let user code refer to top-level classes by their bare names. Before this change, `classDefToLaurel` unconditionally registered an alias whenever `prefixedName != cls.name`. That was fine for top-level classes, but since the previous commit switched nested-class recursion to run under the parent's `prefixedName` as module prefix, each nested class now also hit that branch. Two consequences: * Nested class names became user-addressable top-level type names, which was not the intent (they should be reachable only via the qualified `Outer_Inner` path their parent's field references go through). * When multiple sibling outers each declare a nested class of the same bare name (e.g., `A.Child` and `B.Child`), the alias map entry for `Child` raced between them, last-writer-wins, giving silently ambiguous resolutions in user code. Thread an `isNested` flag (default `false`) through `classDefToLaurel`; the recursive call in the subclass loop passes `isNested := true`, and the alias insertion is skipped in that case. Top-level behavior is unchanged (the sole external caller uses the default). Add two regression tests asserting that `result.typeAliases.keys` for a single outer-with-nested class contains only `Outer`, and for two siblings contains only `A` and `B` — neither `Inner` nor `Child` leaks. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The previous `SpecType.mapIdentNames` only rewrote the top-level `idents` array, leaving `SpecIdent.args` (e.g., the `Inner` inside `List[Inner]`, `Dict[str, Inner]`, `Union[None, Inner]`) and `SpecTypedDict.fieldTypes` untouched. It also mapped idents in place without re-sorting, so rewrites that change the `PythonIdent` ordering key (e.g., populating an empty `pythonModule`) broke the `SpecType.idents`/`typedDicts` "sorted by ordering" invariant. Replace with a mutually recursive walk over `SpecType`/`SpecIdent`/`SpecTypedDict`. Each function strips one layer of nesting so `termination_by sizeOf _` is straightforward. The new `SpecType.mapIdentNames` rebuilds its result by folding per-atom singleton `SpecType`s through `SpecType.union`, which uses `sortedMerge` to restore sort and dedup. The existing sole caller (`qualifyNestedClassRefs` in `ToLaurel.lean`) benefits transparently — no changes needed there. Add direct unit tests in `StrataTest/Languages/Python/Specs/DeclsTest.lean`: * Top-level bare ident (baseline). * `List[Inner]` — recursion into `SpecIdent.args` (single arg). * `Dict[str, Inner]` — recursion over multiple args. * `Union[None, Inner]` — rewrite across multiple ident atoms, plus the reordering case since `Outer.Inner` now sorts ahead of `_types.NoneType`. * `TypedDict(f : Inner)` — recursion into `SpecTypedDict.fieldTypes`. * `List[TypedDict(f : Inner)]` — exercises both recursion paths. * Sort-restoration: start with `[a.Alpha, b.Beta]`, rewrite `b.Beta → a.Beta`, check the result prints in sorted order `[a.Alpha, a.Beta]` rather than the in-place order. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Contributor
|
My first inclination is to say there's a bug that We could just add a |
keyboardDrummer
pushed a commit
to keyboardDrummer/Strata
that referenced
this pull request
May 21, 2026
…ata-org#1151) ## Summary `PythonIdent.pythonModule` was a raw `String`, allowing empty module names to propagate silently through the system and cause downstream failures (see strata-org#1143). This PR changes `pythonModule` to use `ModuleName` — which now guarantees at least one non-empty component — and adopts it more widely through the PySpec pipeline, eliminating this class of bugs at the type level. ## Approach Rather than adding runtime checks at each use site, `ModuleName` gains a strengthened invariant via a private constructor enforcing non-empty components. `PythonIdent.ofComponent` uses `decide` to verify non-emptiness at compile time for literal module names (used in `Decls.lean`). Runtime parsing goes through `ModuleName.ofString?` which returns `Option`. ## Key changes - **PythonIdent.lean**: `PythonIdent.pythonModule` changed from `String` to `ModuleName`. New `ofComponent` constructor with compile-time proof for single-component literals. `ModuleName` API extended: `back`, `parent (n)`, `push`, `append`/`++`, `toString (sep)` - **PythonIdent.lean**: New `ModuleName.ofRelativePath` provides a single canonical function for deriving a `ModuleName` from a `.py` file path relative to a search root, handling both regular files and `__init__.py` packages. This eliminates redundant ad-hoc implementations scattered across `discoverModules`, `RelativeImportTest`, and `findInPath` — each of which had slightly different `__init__.py` handling, creating inconsistency and latent bugs (e.g., `pySpecToLaurelCommand` silently failed for package modules) - **Specs.lean**: `findInPath` simplified to return `(FilePath × Bool)` instead of `(FilePath × Array ModuleComponent)` — the `Bool` is `isInit`, and callers use `ModuleOfPath.modulePrefix` to derive the package prefix. Deleted unused `ofFile` function - **Specs/DDM.lean**: Converted `fromDDM` functions to monadic `FromDDM` (`Except (SourceRange × String)`) for proper error propagation instead of panics - **Specs/ToLaurel.lean**: `signaturesToLaurel` takes `ModuleName` instead of `String` prefix. Overload extraction factored into dedicated `ExtractM` monad (no dependency on `ToLaurelContext`) - **PySpecPipeline.lean**: `buildPySpecLaurel` takes `Array (ModuleName × String)` instead of `Array (String × String)` - **Specs.lean**: Import resolution uses `ModuleName` throughout; `translateFile` requires a `ModuleName` (no longer optional) - **Specs/Decls.lean**: Builtin idents use `ofComponent` with compile-time proof ### Unrelated cleanup - Deleted `Strata.Python.containsSubstr` helper and replaced all call sites with `String.contains` (which accepts substring patterns in Lean 4.29) By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some PySpec producers emit references to a class's nested class using the 'typeClassNoArgs' DDM grammar operator, which carries only a bare identifier with no module path. On round-trip through 'DDM.SpecType.fromDDM', the reconstructed 'PythonIdent' has an empty 'pythonModule', so 'specTypeToLaurelType' renders the reference with its bare name rather than the qualified path under which the nested class is emitted.
When multiple sibling classes each declare a nested class of the same bare name, the resulting references collide, are filtered out by 'filterPrelude' as dangling (the nested class itself lives under a qualified name), and the Laurel-to-Core translator bails out with 'Resolution failed: is not defined'.
Fix at the PySpec-to-Laurel lowering site:
Add regression tests in 'StrataTest/Languages/Python/ToLaurelTest.lean' covering: a single outer class with a nested-class field reference, two sibling outer classes each declaring a nested class of the same bare name (no collision), and a bare reference to an unrelated top-level class (left unqualified).
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.