Skip to content

feat: add auto-generated Python data structures from ModelPack schema#184

Open
rishi-jat wants to merge 3 commits intomodelpack:mainfrom
rishi-jat:feat/python-api-from-schema
Open

feat: add auto-generated Python data structures from ModelPack schema#184
rishi-jat wants to merge 3 commits intomodelpack:mainfrom
rishi-jat:feat/python-api-from-schema

Conversation

@rishi-jat
Copy link

@rishi-jat rishi-jat commented Mar 13, 2026

Summary

This PR adds auto-generated Python data structures for the ModelPack specification so downstream Python projects can import the spec types directly instead of copying them manually.

Example usage:

python
from model_spec.v1 import Model

model = Model.model_validate_json(json_payload)
print(model.descriptor.name)

Implementation

Python models are generated from the canonical schema: schema/config-schema.json

Generation uses datamodel-code-generator and produces Pydantic models.

Added files:

  • py/model_spec/v1/models.py – generated Python models
  • py/model_spec/v1/init.py – exposes the public API
  • tools/generate_python_models.py – script to regenerate models
  • py/README.md – documentation for usage and regeneration

A Makefile target was added to regenerate the models: make generate-python-api

Validation

go test ./...
Python import works: from model_spec.v1 import Model
Model generation works: make generate-python-api

fixes #138

Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Copilot AI review requested due to automatic review settings March 13, 2026 08:51
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by providing auto-generated Python data structures for the ModelPack specification. This change streamlines the development process for Python projects that interact with ModelPack, ensuring consistency and reducing the need for manual type definition. It establishes a robust mechanism for keeping Python models synchronized with the core schema, thereby improving developer experience and reducing potential errors.

Highlights

  • Python Data Structures: Auto-generated Python data structures for the ModelPack specification have been added, allowing downstream Python projects to import spec types directly.
  • Pydantic Model Generation: Pydantic models are generated from the canonical schema/config-schema.json using datamodel-code-generator.
  • New Files Introduced: Key new files include py/model_spec/v1/models.py (generated models), py/model_spec/v1/__init__.py (public API exposure), tools/generate_python_models.py (generation script), and py/README.md (documentation).
  • Makefile Integration: A new Makefile target, generate-python-api, has been added to simplify the regeneration of these Python models.
Changelog
  • Makefile
    • Added a new generate-python-api target to automate the generation of Python API models from the JSON schema.
  • py/README.md
    • Added a new README file providing documentation for the Python ModelPack types, including usage examples and instructions for regeneration.
  • py/model_spec/v1/init.py
    • Added a new __init__.py file to expose the generated Pydantic models as part of the model_spec.v1 package.
  • py/model_spec/v1/models.py
    • Added a new models.py file containing the auto-generated Pydantic data structures for the ModelPack specification, derived from config-schema.json.
  • tools/generate_python_models.py
    • Added a new Python script responsible for executing datamodel-code-generator to generate the Pydantic models from the JSON schema.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@rishi-jat
Copy link
Author

/cc @bergwolf
/cc @gorkem
/cc @aftersnow
/cc @chlins

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces auto-generated Python data structures for the ModelPack specification, which is a great addition for Python users. My review focuses on improving the developer experience and correctness of the new Python package. I've suggested making the Python models more idiomatic by using snake_case for field names, improving the documentation to include dependency installation, and ensuring all necessary types are exported from the package.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an auto-generated Python API (Pydantic models) for the canonical ModelPack JSON Schema so downstream Python consumers can import spec-aligned types directly.

Changes:

  • Add a generator script (datamodel-code-generator) to produce Pydantic v2 models from schema/config-schema.json.
  • Commit generated Python models and a small model_spec.v1 import surface.
  • Add Python usage/regeneration docs and a Makefile target to regenerate the models.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/generate_python_models.py Adds a CLI script to generate Pydantic v2 models from the canonical JSON schema.
py/model_spec/v1/models.py Introduces generated Pydantic models representing the v1 schema types.
py/model_spec/v1/init.py Exposes a small public API surface for v1 model imports.
py/README.md Documents intended usage and regeneration workflow for the Python types.
Makefile Adds generate-python-api target to regenerate the Python models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
@aftersnow
Copy link
Contributor

Thanks for working on this. The schema-driven generation approach here looks useful, but it overlaps quite a bit with #175. I would suggest collaborating with #175 so we can converge on one Python SDK path and avoid maintaining two parallel Python APIs.

@rishi-jat
Copy link
Author

After reviewing #175 and the current repo structure, here is my assessment.

The Python SDK in this PR is functional and well put together, but it introduces a second representation of the ModelPack specification by manually defining dataclasses that mirror the schema and Go types.

In this repository, schema/config-schema.json is already the canonical source of truth. The Go implementation aligns with that schema. Recreating the same structure manually in Python means the spec now exists in multiple places:

  • JSON schema (canonical)
  • Go structs
  • Python dataclasses (this PR)

This creates a maintenance issue:

  • Any schema change requires manual updates in Python
  • There is no guarantee the Python layer stays aligned with the schema
  • Drift between schema and SDK becomes likely over time

I also noticed:

  • Schema loading is path-based, which is fragile for packaging/install use
  • Validator correctness depends on strict schema dialect alignment
  • Serialization logic is manually trying to mirror Go behavior, which is hard to maintain as the spec evolves

From what I see in the code and repo design, the direction looks schema-first rather than handwritten models.

An alternative that seems more aligned with the current design would be to keep the schema as the single source of truth and derive Python types from it, instead of maintaining handwritten spec types in parallel. Supporting pieces like validation, tests, and packaging can still be layered on top of that.

@aftersnow want to check your view on this:

Would it make more sense to avoid maintaining handwritten spec types here and instead keep the schema as the single source of truth, with Python types derived from it?

@pradhyum6144
Copy link
Contributor

@rishi-jat these are valid points and I agree with most of them.

You're right that hand written dataclasses create a maintenance burden. I already acknowledged this in my earlier reply I'm happy to adopt schema-driven generation as the base and layer my contributions on top.

Here's what I think a combined approach looks like:

From #184 (your PR): Schemadriven auto-generation via datamodel-code-generator this becomes the foundation for Python types, keeping them in sync with config-schema.json automatically.

From #175 (my PR):

Validator with correct dialect alignment (Draft4Validator matching the schema's draft-04 declaration)
Test suite (64 tests covering serialization, validation, edge cases)
pyproject.toml packaging
CI workflow (PR #180 pytest across Python 3.10-3.13 + ruff linting)
I can rebase my PR to use your generated types as the base and keep the validator/tests/packaging on top. Or we can merge into one joint PR whatever works best for you and the maintainers.

@pradhyum6144
Copy link
Contributor

@aftersnow what would you prefer two layered PRs or one combined effort???

@rishi-jat
Copy link
Author

rishi-jat commented Mar 22, 2026

@rishi-jat these are valid points and I agree with most of them.

You're right that hand written dataclasses create a maintenance burden. I already acknowledged this in my earlier reply I'm happy to adopt schema-driven generation as the base and layer my contributions on top.

Here's what I think a combined approach looks like:

From #184 (your PR): Schemadriven auto-generation via datamodel-code-generator this becomes the foundation for Python types, keeping them in sync with config-schema.json automatically.

From #175 (my PR):

Validator with correct dialect alignment (Draft4Validator matching the schema's draft-04 declaration) Test suite (64 tests covering serialization, validation, edge cases) pyproject.toml packaging CI workflow (PR #180 pytest across Python 3.10-3.13 + ruff linting) I can rebase my PR to use your generated types as the base and keep the validator/tests/packaging on top. Or we can merge into one joint PR whatever works best for you and the maintainers.

This direction makes sense to me.

This PR keeps the schema as the source of truth and generates Python models from it. The other PR’s validator, tests, and packaging can be layered on top instead of maintaining separate handwritten types.

@aftersnow does this approach look right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is is possible to auto generate Python APIs?

4 participants