fix: address review issues in static schema generation#2805
fix: address review issues in static schema generation#2805markphelps wants to merge 15 commits intofix/dict-list-output-schemafrom
Conversation
The tree-sitter schema parser rejected unparameterized dict and list
return types with 'unsupported type'. This broke the resnet example
and any predictor returning -> dict or -> list.
- dict/Dict map to TypeAny (JSON Schema: {"type": "object"})
- list/List map to OutputList with TypeAny (JSON Schema: {"type": "array", "items": {"type": "object"}})
This matches the old Python schema gen behavior exactly:
dict -> Dict[str, Any] -> {"type": "object"}
list -> List[Any] -> {"type": "array", "items": {"type": "object"}}
…nsive tests Replace the flat OutputType system with a recursive SchemaType algebraic data type that supports arbitrary nesting (dict[str, list[dict[str, int]]], etc.). Key changes: - SchemaType ADT with 7 kinds: Primitive, Any, Array, Dict, Object, Iterator, ConcatIterator - ResolveSchemaType: recursive resolver replacing ResolveOutputType - Cross-file model resolution: imports from local .py files are found on disk, parsed with tree-sitter, and BaseModel subclasses extracted automatically - Handles all local import permutations: relative, dotted, subpackage, aliased - Clear error messages for unresolvable types (includes import source and guidance) - Remove legacy OutputType, OutputKind, ObjectField, ResolveOutputType - Thread sourceDir through Parser -> ParsePredictor for filesystem access - Rewrite architecture/02-schema.md for the static Go parser Tests: 93 unit tests (12 recursive nesting, 5 unresolvable errors, 3 pydantic compat, 11 cross-file resolution, 1 end-to-end schema gen) + 1 integration test
…ursive model fields - Propagate errors from dict value type resolution instead of silently falling back to opaque SchemaAny (dict[str, Tensor] now errors) - Extract UnwrapOptional helper used by ResolveFieldType, resolveUnionSchemaType, and resolveFieldSchemaType (3 callsites) - resolveModelToSchemaType now uses ResolveSchemaType via resolveFieldSchemaType, supporting dict/nested types inside BaseModel fields (previously limited to primitives, Optional[T], List[T]) - Fix stale comments: Optional rejected not nullable, Iterator allows nested types - Fix garbled unicode in architecture docs, fix SchemaAny table entry
- Fix nullable incorrectly set on non-required fields instead of field.Type.Nullable (debug: bool = False was appearing nullable) - Remove dead CogArrayType/CogArrayDisplay struct fields (hardcoded in coreSchema, never read from struct) - Distinguish os.ErrNotExist from permission errors in cross-file resolution; warn on parse failures instead of silently ignoring - Fix bare dict/list JSON Schema examples in architecture docs - Add regression tests: defaulted non-Optional field not nullable, Optional field nullable in JSON Schema, dict[str, Tensor] errors instead of silently producing SchemaAny (both top-level and inside BaseModel fields)
Four fuzz targets exercising the schema pipeline: - FuzzResolveSchemaType: arbitrary TypeAnnotation trees through the recursive resolver - FuzzJSONSchema: random SchemaType trees through JSON Schema rendering - FuzzParsePredictor: arbitrary bytes as Python source through the tree-sitter parser - FuzzParseTypeAnnotation: arbitrary return type strings in predict signatures Includes: - mise task 'test:fuzz' (FUZZTIME=30s per target by default) - CI job 'fuzz-go' running 30s per target on Go changes - Byte encoder/decoder for deterministic TypeAnnotation and SchemaType tree construction from fuzz corpus
* fix: make static schema generation opt-in via COG_STATIC_SCHEMA env var The static Go tree-sitter schema generator was the default for SDK >= 0.17.0, which risks breaking builds when the parser encounters types it cannot resolve. - Gate static schema gen behind COG_STATIC_SCHEMA=1 (or "true") env var - Legacy runtime schema generation (boot container + python introspection) remains the default - When opted in, gracefully fall back to legacy on ErrUnresolvableType instead of hard-failing the build - Add unit tests for canUseStaticSchemaGen (12 table-driven cases) - Add integration test for the static->legacy fallback path - Update existing static/multi-file integration tests to set the env var * docs: update schema architecture doc for opt-in static generation model Reflect that static schema generation is opt-in via COG_STATIC_SCHEMA=1, with legacy runtime path as the default and automatic fallback. * fix: restore legacy runtime schema generation modules for fallback path Restore the Python modules needed by the legacy runtime schema generation path (python -m cog.command.openapi_schema). These were deleted in 61eedf3 but are needed as the fallback when the static Go parser encounters types it cannot resolve. Restored modules: _adt, _inspector, _schemas, coder, config, errors, mode, suppress_output, command/__init__, command/openapi_schema. config.py is trimmed to only what openapi_schema.py needs (removed get_predictor_types which depended on deleted get_predict/get_train). * fix: simplify schema gen gating — let coglet handle missing schema for train/serve Remove the skipLabels override that forced static schema gen for cog train/serve paths. Now useStatic is purely opt-in via COG_STATIC_SCHEMA=1 for all commands. For train/serve without the env var, no schema is generated at build time. Coglet gracefully handles this (warns and accepts all input). These are local-only images that don't need strict schema validation. Also improves the static_schema_fallback integration test to use a realistic mypackage/__init__.py scenario instead of a plain class. * fix: restore static gen for train/serve paths, handle pydantic v2 in legacy inspector Two IT failures: 1. training_setup: cog train needs schema for -i flag parsing. The CLI fetches it from coglet's /openapi.json, which returns 503 when no schema file exists. Re-enable static gen for skipLabels paths (same as main) since there's no post-build legacy fallback for these. 2. pydantic2_output: the legacy runtime inspector (_inspector.py) didn't handle pydantic v2 BaseModel as output types — only cog.BaseModel. Add conditional pydantic.BaseModel check with model_fields iteration. * fix: suppress pyright reportMissingImports for optional pydantic import * fix: flatten nested error handling to early returns, case-insensitive env var check
Add cycle detection for recursive BaseModel types to prevent stack overflow, use ordered maps for deterministic JSON property output, fix forward reference handling in model field type annotations, and close test coverage gaps for nested BaseModels, list/Iterator of BaseModel, and property ordering.
| // it indicates a missing case (e.g. a new kind was added without updating | ||
| // this switch). Panic to surface the bug immediately rather than silently | ||
| // returning a wrong schema. | ||
| panic(fmt.Sprintf("unhandled SchemaTypeKind: %d", s.Kind)) |
…TypeFromString Fuzz testing found that a bare quote character (e.g. `"`) matched the forward reference check but caused s[1:0] slice panic. Add len >= 2 guard before slicing.
There was a problem hiding this comment.
Pull request overview
This PR addresses remaining review items in static JSON schema generation for Python predictors, focusing on BaseModel recursion safety, forward-reference parsing, and deterministic JSON output.
Changes:
- Add cycle detection when resolving recursive Pydantic
BaseModeloutput schemas to prevent infinite recursion. - Improve parsing of quoted forward references inside string-based annotations.
- Make JSON Schema object
propertiesordering deterministic by using an ordered map type, and add/adjust tests accordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pkg/schema/schema_type.go |
Removes unused dict key typing, adds ordered properties output, adds cycle detection in model resolution, and panics on unhandled SchemaTypeKind. |
pkg/schema/python/parser.go |
Adds quote-stripping forward-reference handling in parseTypeFromString. |
pkg/schema/python/parser_test.go |
Adds new tests for forward refs, nested BaseModels, BaseModel generics, recursion safety, and deterministic property order; adjusts existing nullable assertions via JSON round-trip. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| // Forward reference: quoted string like "MyType" or 'MyType' | ||
| if len(s) >= 2 && | ||
| ((strings.HasPrefix(s, "\"") && strings.HasSuffix(s, "\"")) || | ||
| (strings.HasPrefix(s, "'") && strings.HasSuffix(s, "'"))) { | ||
| inner := s[1 : len(s)-1] |
There was a problem hiding this comment.
The forward-reference quote stripping happens after the | union parsing. A quoted union like "TreeNode | None" will be split on | before quotes are removed, causing parsing to fail and the annotation to be dropped. Consider stripping surrounding quotes immediately after TrimSpace (before union/generic handling) so quoted unions/generics are handled correctly.
| betaIdx := indexOf(jsonStr, `"beta"`) | ||
| gammaIdx := indexOf(jsonStr, `"gamma"`) | ||
| deltaIdx := indexOf(jsonStr, `"delta"`) | ||
|
|
There was a problem hiding this comment.
This test compares ordering via substring indices but doesn't assert that each key was actually found (index != -1). If a key is missing, the failure can be a confusing ordering assertion (e.g. -1 > -1). Add explicit assertions that each index is >= 0 before comparing their relative order.
| require.GreaterOrEqual(t, alphaIdx, 0, `"alpha" property should be present in JSON schema`) | |
| require.GreaterOrEqual(t, betaIdx, 0, `"beta" property should be present in JSON schema`) | |
| require.GreaterOrEqual(t, gammaIdx, 0, `"gamma" property should be present in JSON schema`) | |
| require.GreaterOrEqual(t, deltaIdx, 0, `"delta" property should be present in JSON schema`) |
1e78590 to
575454e
Compare
Summary
Addresses remaining issues from the
fix/dict-list-output-schemabranch.Bug fixes
ResolveSchemaTypenow tracks aseenset to prevent infinite recursion on self-referential models likeclass TreeNode(BaseModel): child: Optional["TreeNode"]. Cycles emit an opaque{"type": "object"}instead of stack overflowing.parseTypeFromStringnow strips quotes from forward references like"MyType"or'MyType'inside generic annotations (e.g.Optional["TreeNode"]). Previously these fields were silently dropped.SchemaObject.coreSchema()now usesorderedMapAnyinstead ofmap[string]anyfor thepropertiesfield, ensuring consistent JSON output.Code cleanup
SchemaTypeKind— replaced silent fallthrough returning{"type": "object"}with a panic that surfaces missing switch cases immediately.SchemaObjectfield loop no longer re-setsnullablesincejsonSchema()already handles it.KeyTypefield — dict key types are always strings in JSON Schema; the field was set but never read.Test coverage (8 new tests)
TestForwardReferenceOutput/TestForwardReferenceGeneric— quoted string annotationsTestNestedBaseModelField—class Outer(BaseModel): inner: InnercompositionTestListOfBaseModelOutput/TestIteratorOfBaseModelOutput— BaseModel in generic outputsTestRecursiveBaseModelDoesNotStackOverflow— self-referential model cycle detectionTestOutputObjectPropertyOrderDeterministic— JSON property orderingNot addressed (documented, behind opt-in gate)
__init__.pypackage import support