fix: address review issues in static schema generation by markphelps · Pull Request #2805 · replicate/cog

markphelps · 2026-03-04T21:14:03Z

Summary

Addresses remaining issues from thefix/dict-list-output-schema branch.

Bug fixes

Cycle detection for recursive BaseModels — ResolveSchemaType now tracks a seen set to prevent infinite recursion on self-referential models like class TreeNode(BaseModel): child: Optional["TreeNode"]. Cycles emit an opaque {"type": "object"} instead of stack overflowing.
Forward reference handling in model fields — parseTypeFromString now strips quotes from forward references like "MyType" or 'MyType' inside generic annotations (e.g. Optional["TreeNode"]). Previously these fields were silently dropped.
Deterministic JSON property order — SchemaObject.coreSchema() now uses orderedMapAny instead of map[string]any for the properties field, ensuring consistent JSON output.

Code cleanup

Explicit panic for unknown SchemaTypeKind — replaced silent fallthrough returning {"type": "object"} with a panic that surfaces missing switch cases immediately.
Removed redundant nullable — SchemaObject field loop no longer re-sets nullable since jsonSchema() already handles it.
Removed unused KeyType field — dict key types are always strings in JSON Schema; the field was set but never read.

Test coverage (8 new tests)

TestForwardReferenceOutput / TestForwardReferenceGeneric — quoted string annotations
TestNestedBaseModelField — class Outer(BaseModel): inner: Inner composition
TestListOfBaseModelOutput / TestIteratorOfBaseModelOutput — BaseModel in generic outputs
TestRecursiveBaseModelDoesNotStackOverflow — self-referential model cycle detection
TestOutputObjectPropertyOrderDeterministic — JSON property ordering

Not addressed (documented, behind opt-in gate)

Cross-file depth-1 resolution limit
No __init__.py package import support
Fuzz tests don't exercise BaseModel/cross-file paths

The tree-sitter schema parser rejected unparameterized dict and list return types with 'unsupported type'. This broke the resnet example and any predictor returning -> dict or -> list. - dict/Dict map to TypeAny (JSON Schema: {"type": "object"}) - list/List map to OutputList with TypeAny (JSON Schema: {"type": "array", "items": {"type": "object"}}) This matches the old Python schema gen behavior exactly: dict -> Dict[str, Any] -> {"type": "object"} list -> List[Any] -> {"type": "array", "items": {"type": "object"}}

…nsive tests Replace the flat OutputType system with a recursive SchemaType algebraic data type that supports arbitrary nesting (dict[str, list[dict[str, int]]], etc.). Key changes: - SchemaType ADT with 7 kinds: Primitive, Any, Array, Dict, Object, Iterator, ConcatIterator - ResolveSchemaType: recursive resolver replacing ResolveOutputType - Cross-file model resolution: imports from local .py files are found on disk, parsed with tree-sitter, and BaseModel subclasses extracted automatically - Handles all local import permutations: relative, dotted, subpackage, aliased - Clear error messages for unresolvable types (includes import source and guidance) - Remove legacy OutputType, OutputKind, ObjectField, ResolveOutputType - Thread sourceDir through Parser -> ParsePredictor for filesystem access - Rewrite architecture/02-schema.md for the static Go parser Tests: 93 unit tests (12 recursive nesting, 5 unresolvable errors, 3 pydantic compat, 11 cross-file resolution, 1 end-to-end schema gen) + 1 integration test

…ursive model fields - Propagate errors from dict value type resolution instead of silently falling back to opaque SchemaAny (dict[str, Tensor] now errors) - Extract UnwrapOptional helper used by ResolveFieldType, resolveUnionSchemaType, and resolveFieldSchemaType (3 callsites) - resolveModelToSchemaType now uses ResolveSchemaType via resolveFieldSchemaType, supporting dict/nested types inside BaseModel fields (previously limited to primitives, Optional[T], List[T]) - Fix stale comments: Optional rejected not nullable, Iterator allows nested types - Fix garbled unicode in architecture docs, fix SchemaAny table entry

- Fix nullable incorrectly set on non-required fields instead of field.Type.Nullable (debug: bool = False was appearing nullable) - Remove dead CogArrayType/CogArrayDisplay struct fields (hardcoded in coreSchema, never read from struct) - Distinguish os.ErrNotExist from permission errors in cross-file resolution; warn on parse failures instead of silently ignoring - Fix bare dict/list JSON Schema examples in architecture docs - Add regression tests: defaulted non-Optional field not nullable, Optional field nullable in JSON Schema, dict[str, Tensor] errors instead of silently producing SchemaAny (both top-level and inside BaseModel fields)

Four fuzz targets exercising the schema pipeline: - FuzzResolveSchemaType: arbitrary TypeAnnotation trees through the recursive resolver - FuzzJSONSchema: random SchemaType trees through JSON Schema rendering - FuzzParsePredictor: arbitrary bytes as Python source through the tree-sitter parser - FuzzParseTypeAnnotation: arbitrary return type strings in predict signatures Includes: - mise task 'test:fuzz' (FUZZTIME=30s per target by default) - CI job 'fuzz-go' running 30s per target on Go changes - Byte encoder/decoder for deterministic TypeAnnotation and SchemaType tree construction from fuzz corpus

* fix: make static schema generation opt-in via COG_STATIC_SCHEMA env var The static Go tree-sitter schema generator was the default for SDK >= 0.17.0, which risks breaking builds when the parser encounters types it cannot resolve. - Gate static schema gen behind COG_STATIC_SCHEMA=1 (or "true") env var - Legacy runtime schema generation (boot container + python introspection) remains the default - When opted in, gracefully fall back to legacy on ErrUnresolvableType instead of hard-failing the build - Add unit tests for canUseStaticSchemaGen (12 table-driven cases) - Add integration test for the static->legacy fallback path - Update existing static/multi-file integration tests to set the env var * docs: update schema architecture doc for opt-in static generation model Reflect that static schema generation is opt-in via COG_STATIC_SCHEMA=1, with legacy runtime path as the default and automatic fallback. * fix: restore legacy runtime schema generation modules for fallback path Restore the Python modules needed by the legacy runtime schema generation path (python -m cog.command.openapi_schema). These were deleted in 61eedf3 but are needed as the fallback when the static Go parser encounters types it cannot resolve. Restored modules: _adt, _inspector, _schemas, coder, config, errors, mode, suppress_output, command/__init__, command/openapi_schema. config.py is trimmed to only what openapi_schema.py needs (removed get_predictor_types which depended on deleted get_predict/get_train). * fix: simplify schema gen gating — let coglet handle missing schema for train/serve Remove the skipLabels override that forced static schema gen for cog train/serve paths. Now useStatic is purely opt-in via COG_STATIC_SCHEMA=1 for all commands. For train/serve without the env var, no schema is generated at build time. Coglet gracefully handles this (warns and accepts all input). These are local-only images that don't need strict schema validation. Also improves the static_schema_fallback integration test to use a realistic mypackage/__init__.py scenario instead of a plain class. * fix: restore static gen for train/serve paths, handle pydantic v2 in legacy inspector Two IT failures: 1. training_setup: cog train needs schema for -i flag parsing. The CLI fetches it from coglet's /openapi.json, which returns 503 when no schema file exists. Re-enable static gen for skipLabels paths (same as main) since there's no post-build legacy fallback for these. 2. pydantic2_output: the legacy runtime inspector (_inspector.py) didn't handle pydantic v2 BaseModel as output types — only cog.BaseModel. Add conditional pydantic.BaseModel check with model_fields iteration. * fix: suppress pyright reportMissingImports for optional pydantic import * fix: flatten nested error handling to early returns, case-insensitive env var check

Add cycle detection for recursive BaseModel types to prevent stack overflow, use ordered maps for deterministic JSON property output, fix forward reference handling in model field type annotations, and close test coverage gaps for nested BaseModels, list/Iterator of BaseModel, and property ordering.

markphelps · 2026-03-04T21:15:42Z

pkg/schema/schema_type.go

+		// it indicates a missing case (e.g. a new kind was added without updating
+		// this switch). Panic to surface the bug immediately rather than silently
+		// returning a wrong schema.
+		panic(fmt.Sprintf("unhandled SchemaTypeKind: %d", s.Kind))


controversial

…TypeFromString Fuzz testing found that a bare quote character (e.g. `"`) matched the forward reference check but caused s[1:0] slice panic. Add len >= 2 guard before slicing.

Copilot

Pull request overview

This PR addresses remaining review items in static JSON schema generation for Python predictors, focusing on BaseModel recursion safety, forward-reference parsing, and deterministic JSON output.

Changes:

Add cycle detection when resolving recursive Pydantic BaseModel output schemas to prevent infinite recursion.
Improve parsing of quoted forward references inside string-based annotations.
Make JSON Schema object properties ordering deterministic by using an ordered map type, and add/adjust tests accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`pkg/schema/schema_type.go`	Removes unused dict key typing, adds ordered `properties` output, adds cycle detection in model resolution, and panics on unhandled `SchemaTypeKind`.
`pkg/schema/python/parser.go`	Adds quote-stripping forward-reference handling in `parseTypeFromString`.
`pkg/schema/python/parser_test.go`	Adds new tests for forward refs, nested BaseModels, BaseModel generics, recursion safety, and deterministic property order; adjusts existing nullable assertions via JSON round-trip.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-04T21:35:22Z

pkg/schema/python/parser.go

+	// Forward reference: quoted string like "MyType" or 'MyType'
+	if len(s) >= 2 &&
+		((strings.HasPrefix(s, "\"") && strings.HasSuffix(s, "\"")) ||
+			(strings.HasPrefix(s, "'") && strings.HasSuffix(s, "'"))) {
+		inner := s[1 : len(s)-1]


The forward-reference quote stripping happens after the | union parsing. A quoted union like "TreeNode | None" will be split on | before quotes are removed, causing parsing to fail and the annotation to be dropped. Consider stripping surrounding quotes immediately after TrimSpace (before union/generic handling) so quoted unions/generics are handled correctly.

Copilot · 2026-03-04T21:35:23Z

pkg/schema/python/parser_test.go

+	betaIdx := indexOf(jsonStr, `"beta"`)
+	gammaIdx := indexOf(jsonStr, `"gamma"`)
+	deltaIdx := indexOf(jsonStr, `"delta"`)
+


This test compares ordering via substring indices but doesn't assert that each key was actually found (index != -1). If a key is missing, the failure can be a confusing ordering assertion (e.g. -1 > -1). Add explicit assertions that each index is >= 0 before comparing their relative order.

Suggested change

require.GreaterOrEqual(t, alphaIdx, 0, `"alpha" property should be present in JSON schema`)

require.GreaterOrEqual(t, betaIdx, 0, `"beta" property should be present in JSON schema`)

require.GreaterOrEqual(t, gammaIdx, 0, `"gamma" property should be present in JSON schema`)

require.GreaterOrEqual(t, deltaIdx, 0, `"delta" property should be present in JSON schema`)

tempusfrangit and others added 14 commits February 27, 2026 16:40

Merge branch 'main' into fix/dict-list-output-schema

3d7c4db

Merge branch 'main' into fix/dict-list-output-schema

51b7744

fix: combine appends to satisfy gocritic lint

6efe6b7

Merge branch 'main' into fix/dict-list-output-schema

a14eb5c

Merge branch 'main' into fix/dict-list-output-schema

0045ae9

chore: remove debug logging from _schemas.py

f4f03d6

Merge branch 'main' into fix/dict-list-output-schema

b7a79e9

markphelps requested a review from a team as a code owner March 4, 2026 21:14

markphelps commented Mar 4, 2026

View reviewed changes

fix: guard against slice panic on single-char quoted strings in parse…

d67210d

…TypeFromString Fuzz testing found that a bare quote character (e.g. `"`) matched the forward reference check but caused s[1:0] slice panic. Add len >= 2 guard before slicing.

markphelps requested a review from Copilot March 4, 2026 21:30

Copilot started reviewing on behalf of markphelps March 4, 2026 21:30 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

michaeldwan force-pushed the fix/dict-list-output-schema branch from 1e78590 to 575454e Compare March 5, 2026 17:53

markphelps added this to the 0.18.0 milestone Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address review issues in static schema generation#2805

fix: address review issues in static schema generation#2805
markphelps wants to merge 15 commits intofix/dict-list-output-schemafrom
mp/fix/dict-list-output-schema-fixes

markphelps commented Mar 4, 2026 •

edited

Loading

Uh oh!

markphelps Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+	require.GreaterOrEqual(t, alphaIdx, 0, `"alpha" property should be present in JSON schema`)
+	require.GreaterOrEqual(t, betaIdx, 0, `"beta" property should be present in JSON schema`)
+	require.GreaterOrEqual(t, gammaIdx, 0, `"gamma" property should be present in JSON schema`)
+	require.GreaterOrEqual(t, deltaIdx, 0, `"delta" property should be present in JSON schema`)

Conversation

markphelps commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fixes

Code cleanup

Test coverage (8 new tests)

Not addressed (documented, behind opt-in gate)

Uh oh!

markphelps Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

markphelps commented Mar 4, 2026 •

edited

Loading