Skip to content

Qualify nested-class references when lowering pyspec class fields#1143

Closed
tautschnig wants to merge 4 commits into
mainfrom
tautschnig/nested-class-references
Closed

Qualify nested-class references when lowering pyspec class fields#1143
tautschnig wants to merge 4 commits into
mainfrom
tautschnig/nested-class-references

Conversation

@tautschnig
Copy link
Copy Markdown
Contributor

Some PySpec producers emit references to a class's nested class using the 'typeClassNoArgs' DDM grammar operator, which carries only a bare identifier with no module path. On round-trip through 'DDM.SpecType.fromDDM', the reconstructed 'PythonIdent' has an empty 'pythonModule', so 'specTypeToLaurelType' renders the reference with its bare name rather than the qualified path under which the nested class is emitted.

When multiple sibling classes each declare a nested class of the same bare name, the resulting references collide, are filtered out by 'filterPrelude' as dangling (the nested class itself lives under a qualified name), and the Laurel-to-Core translator bails out with 'Resolution failed: is not defined'.

Fix at the PySpec-to-Laurel lowering site:

  • Add 'SpecType.mapIdentNames' for rewriting the PythonIdent on each 'SpecIdent' within a SpecType.
  • In 'classDefToLaurel', before lowering each field, rewrite any bare ident whose name matches one of the current class's 'subclasses' to carry the parent's prefixed name as its 'pythonModule'. After rewriting, 'PythonIdent.toLaurelName' produces, for an 'Outer' class with a nested 'Inner', the name 'Outer_Inner', which matches the name 'classDefToLaurel' assigns to the nested class.
  • When recursing into nested classes, push the parent's prefixed name as the module prefix so the nested class's own emitted name aligns with how its parent references it.

Add regression tests in 'StrataTest/Languages/Python/ToLaurelTest.lean' covering: a single outer class with a nested-class field reference, two sibling outer classes each declaring a nested class of the same bare name (no collision), and a bare reference to an unrelated top-level class (left unqualified).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Some PySpec producers emit references to a class's nested class using the
'typeClassNoArgs' DDM grammar operator, which carries only a bare
identifier with no module path.  On round-trip through
'DDM.SpecType.fromDDM', the reconstructed 'PythonIdent' has an empty
'pythonModule', so 'specTypeToLaurelType' renders the reference with its
bare name rather than the qualified path under which the nested class is
emitted.

When multiple sibling classes each declare a nested class of the same
bare name, the resulting references collide, are filtered out by
'filterPrelude' as dangling (the nested class itself lives under a
qualified name), and the Laurel-to-Core translator bails out with
'Resolution failed: <name> is not defined'.

Fix at the PySpec-to-Laurel lowering site:

* Add 'SpecType.mapIdentNames' for rewriting the PythonIdent on each
  'SpecIdent' within a SpecType.
* In 'classDefToLaurel', before lowering each field, rewrite any bare
  ident whose name matches one of the current class's 'subclasses' to
  carry the parent's prefixed name as its 'pythonModule'.  After
  rewriting, 'PythonIdent.toLaurelName' produces, for an 'Outer' class
  with a nested 'Inner', the name 'Outer_Inner', which matches the name
  'classDefToLaurel' assigns to the nested class.
* When recursing into nested classes, push the parent's prefixed name as
  the module prefix so the nested class's own emitted name aligns with
  how its parent references it.

Add regression tests in 'StrataTest/Languages/Python/ToLaurelTest.lean'
covering: a single outer class with a nested-class field reference, two
sibling outer classes each declaring a nested class of the same bare
name (no collision), and a bare reference to an unrelated top-level
class (left unqualified).

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a name-resolution failure in the PySpec→Laurel lowering pipeline when class field types refer to nested classes using a bare identifier (empty pythonModule), which can collide across sibling classes after a DDM round-trip.

Changes:

  • Add SpecType.mapIdentNames to rewrite PythonIdents inside a SpecType.
  • In classDefToLaurel, qualify bare field-type references that match a nested class name and recurse into nested classes with an updated module prefix.
  • Add regression tests covering nested-class field references and sibling nested-class name collisions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
Strata/Languages/Python/Specs/Decls.lean Adds SpecType.mapIdentNames helper used for identifier rewriting.
Strata/Languages/Python/Specs/ToLaurel.lean Qualifies nested-class references in field types and updates module prefix when lowering nested classes.
StrataTest/Languages/Python/ToLaurelTest.lean Adds regression tests for nested-class field reference qualification and collision avoidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Strata/Languages/Python/Specs/Decls.lean Outdated
Comment thread Strata/Languages/Python/Specs/ToLaurel.lean
Comment thread Strata/Languages/Python/Specs/ToLaurel.lean Outdated
Comment thread Strata/Languages/Python/Specs/ToLaurel.lean Outdated
Comment thread StrataTest/Languages/Python/ToLaurelTest.lean
tautschnig and others added 3 commits May 8, 2026 09:45
The `let enclosingPrefix := (← read).modulePrefix` binding and the
accompanying `let _ := enclosingPrefix` no-op are left over from an
earlier iteration that tried to save and restore the prefix manually.
`withReader` already scopes the prefix change to the recursive call, so
both lines and the misleading "restore happens automatically" comment
are redundant.

Pure cleanup; no behavior change.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
`typeAliases` is a user-visible map from unprefixed class names to their
prefixed Laurel names; `PySpecPipeline` later uses it to let user code
refer to top-level classes by their bare names.

Before this change, `classDefToLaurel` unconditionally registered an
alias whenever `prefixedName != cls.name`. That was fine for top-level
classes, but since the previous commit switched nested-class recursion
to run under the parent's `prefixedName` as module prefix, each nested
class now also hit that branch. Two consequences:

* Nested class names became user-addressable top-level type names, which
  was not the intent (they should be reachable only via the qualified
  `Outer_Inner` path their parent's field references go through).
* When multiple sibling outers each declare a nested class of the same
  bare name (e.g., `A.Child` and `B.Child`), the alias map entry for
  `Child` raced between them, last-writer-wins, giving silently
  ambiguous resolutions in user code.

Thread an `isNested` flag (default `false`) through `classDefToLaurel`;
the recursive call in the subclass loop passes `isNested := true`, and
the alias insertion is skipped in that case. Top-level behavior is
unchanged (the sole external caller uses the default).

Add two regression tests asserting that `result.typeAliases.keys` for a
single outer-with-nested class contains only `Outer`, and for two
siblings contains only `A` and `B` — neither `Inner` nor `Child` leaks.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The previous `SpecType.mapIdentNames` only rewrote the top-level
`idents` array, leaving `SpecIdent.args` (e.g., the `Inner` inside
`List[Inner]`, `Dict[str, Inner]`, `Union[None, Inner]`) and
`SpecTypedDict.fieldTypes` untouched.  It also mapped idents in place
without re-sorting, so rewrites that change the `PythonIdent` ordering
key (e.g., populating an empty `pythonModule`) broke the
`SpecType.idents`/`typedDicts` "sorted by ordering" invariant.

Replace with a mutually recursive walk over
`SpecType`/`SpecIdent`/`SpecTypedDict`.  Each function strips one layer
of nesting so `termination_by sizeOf _` is straightforward.  The new
`SpecType.mapIdentNames` rebuilds its result by folding per-atom
singleton `SpecType`s through `SpecType.union`, which uses
`sortedMerge` to restore sort and dedup.

The existing sole caller (`qualifyNestedClassRefs` in `ToLaurel.lean`)
benefits transparently — no changes needed there.

Add direct unit tests in `StrataTest/Languages/Python/Specs/DeclsTest.lean`:

* Top-level bare ident (baseline).
* `List[Inner]` — recursion into `SpecIdent.args` (single arg).
* `Dict[str, Inner]` — recursion over multiple args.
* `Union[None, Inner]` — rewrite across multiple ident atoms, plus the
  reordering case since `Outer.Inner` now sorts ahead of
  `_types.NoneType`.
* `TypedDict(f : Inner)` — recursion into `SpecTypedDict.fieldTypes`.
* `List[TypedDict(f : Inner)]` — exercises both recursion paths.
* Sort-restoration: start with `[a.Alpha, b.Beta]`, rewrite
  `b.Beta → a.Beta`, check the result prints in sorted order
  `[a.Alpha, a.Beta]` rather than the in-place order.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@joehendrix
Copy link
Copy Markdown
Contributor

My first inclination is to say there's a bug that moduleName is empty. Do you have an example?

We could just add a moduleName \ne "" constraint to PythonIdent, and make sure we are catching all empty module names at PythonIdent construction time.

@tautschnig tautschnig closed this May 15, 2026
keyboardDrummer pushed a commit to keyboardDrummer/Strata that referenced this pull request May 21, 2026
…ata-org#1151)

## Summary

`PythonIdent.pythonModule` was a raw `String`, allowing empty module
names to propagate silently through the system and cause downstream
failures (see strata-org#1143). This PR changes `pythonModule` to use `ModuleName`
— which now guarantees at least one non-empty component — and adopts it
more widely through the PySpec pipeline, eliminating this class of bugs
at the type level.

## Approach

Rather than adding runtime checks at each use site, `ModuleName` gains a
strengthened invariant via a private constructor enforcing non-empty
components. `PythonIdent.ofComponent` uses `decide` to verify
non-emptiness at compile time for literal module names (used in
`Decls.lean`). Runtime parsing goes through `ModuleName.ofString?` which
returns `Option`.

## Key changes

- **PythonIdent.lean**: `PythonIdent.pythonModule` changed from `String`
to `ModuleName`. New `ofComponent` constructor with compile-time proof
for single-component literals. `ModuleName` API extended: `back`,
`parent (n)`, `push`, `append`/`++`, `toString (sep)`
- **PythonIdent.lean**: New `ModuleName.ofRelativePath` provides a
single canonical function for deriving a `ModuleName` from a `.py` file
path relative to a search root, handling both regular files and
`__init__.py` packages. This eliminates redundant ad-hoc implementations
scattered across `discoverModules`, `RelativeImportTest`, and
`findInPath` — each of which had slightly different `__init__.py`
handling, creating inconsistency and latent bugs (e.g.,
`pySpecToLaurelCommand` silently failed for package modules)
- **Specs.lean**: `findInPath` simplified to return `(FilePath × Bool)`
instead of `(FilePath × Array ModuleComponent)` — the `Bool` is
`isInit`, and callers use `ModuleOfPath.modulePrefix` to derive the
package prefix. Deleted unused `ofFile` function
- **Specs/DDM.lean**: Converted `fromDDM` functions to monadic `FromDDM`
(`Except (SourceRange × String)`) for proper error propagation instead
of panics
- **Specs/ToLaurel.lean**: `signaturesToLaurel` takes `ModuleName`
instead of `String` prefix. Overload extraction factored into dedicated
`ExtractM` monad (no dependency on `ToLaurelContext`)
- **PySpecPipeline.lean**: `buildPySpecLaurel` takes `Array (ModuleName
× String)` instead of `Array (String × String)`
- **Specs.lean**: Import resolution uses `ModuleName` throughout;
`translateFile` requires a `ModuleName` (no longer optional)
- **Specs/Decls.lean**: Builtin idents use `ofComponent` with
compile-time proof

### Unrelated cleanup

- Deleted `Strata.Python.containsSubstr` helper and replaced all call
sites with `String.contains` (which accepts substring patterns in Lean
4.29)

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants