feat: add auto-generated Python data structures from ModelPack schema#184
feat: add auto-generated Python data structures from ModelPack schema#184rishi-jat wants to merge 3 commits intomodelpack:mainfrom
Conversation
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement by providing auto-generated Python data structures for the ModelPack specification. This change streamlines the development process for Python projects that interact with ModelPack, ensuring consistency and reducing the need for manual type definition. It establishes a robust mechanism for keeping Python models synchronized with the core schema, thereby improving developer experience and reducing potential errors. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/cc @bergwolf |
There was a problem hiding this comment.
Code Review
This PR introduces auto-generated Python data structures for the ModelPack specification, which is a great addition for Python users. My review focuses on improving the developer experience and correctness of the new Python package. I've suggested making the Python models more idiomatic by using snake_case for field names, improving the documentation to include dependency installation, and ensuring all necessary types are exported from the package.
There was a problem hiding this comment.
Pull request overview
Adds an auto-generated Python API (Pydantic models) for the canonical ModelPack JSON Schema so downstream Python consumers can import spec-aligned types directly.
Changes:
- Add a generator script (
datamodel-code-generator) to produce Pydantic v2 models fromschema/config-schema.json. - Commit generated Python models and a small
model_spec.v1import surface. - Add Python usage/regeneration docs and a Makefile target to regenerate the models.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/generate_python_models.py | Adds a CLI script to generate Pydantic v2 models from the canonical JSON schema. |
| py/model_spec/v1/models.py | Introduces generated Pydantic models representing the v1 schema types. |
| py/model_spec/v1/init.py | Exposes a small public API surface for v1 model imports. |
| py/README.md | Documents intended usage and regeneration workflow for the Python types. |
| Makefile | Adds generate-python-api target to regenerate the Python models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
|
After reviewing #175 and the current repo structure, here is my assessment. The Python SDK in this PR is functional and well put together, but it introduces a second representation of the ModelPack specification by manually defining dataclasses that mirror the schema and Go types. In this repository,
This creates a maintenance issue:
I also noticed:
From what I see in the code and repo design, the direction looks schema-first rather than handwritten models. An alternative that seems more aligned with the current design would be to keep the schema as the single source of truth and derive Python types from it, instead of maintaining handwritten spec types in parallel. Supporting pieces like validation, tests, and packaging can still be layered on top of that. @aftersnow want to check your view on this: Would it make more sense to avoid maintaining handwritten spec types here and instead keep the schema as the single source of truth, with Python types derived from it? |
|
@rishi-jat these are valid points and I agree with most of them. You're right that hand written dataclasses create a maintenance burden. I already acknowledged this in my earlier reply I'm happy to adopt schema-driven generation as the base and layer my contributions on top. Here's what I think a combined approach looks like: From #184 (your PR): Schemadriven auto-generation via datamodel-code-generator this becomes the foundation for Python types, keeping them in sync with config-schema.json automatically. From #175 (my PR): Validator with correct dialect alignment (Draft4Validator matching the schema's draft-04 declaration) |
|
@aftersnow what would you prefer two layered PRs or one combined effort??? |
This direction makes sense to me. This PR keeps the schema as the source of truth and generates Python models from it. The other PR’s validator, tests, and packaging can be layered on top instead of maintaining separate handwritten types. @aftersnow does this approach look right? |
Summary
This PR adds auto-generated Python data structures for the ModelPack specification so downstream Python projects can import the spec types directly instead of copying them manually.
Example usage:
Implementation
Python models are generated from the canonical schema:
schema/config-schema.jsonGeneration uses datamodel-code-generator and produces Pydantic models.
Added files:
A Makefile target was added to regenerate the models:
make generate-python-apiValidation
go test ./...
Python import works: from model_spec.v1 import Model
Model generation works: make generate-python-api
fixes #138