Skip to content

fix(schema): validate OCI digest format for modelfs.diffIds#192

Open
rishi-jat wants to merge 2 commits intomodelpack:mainfrom
rishi-jat:fix/validate-digest-format
Open

fix(schema): validate OCI digest format for modelfs.diffIds#192
rishi-jat wants to merge 2 commits intomodelpack:mainfrom
rishi-jat:fix/validate-digest-format

Conversation

@rishi-jat
Copy link

Problem

modelfs.diffIds currently accepts arbitrary strings.

There is no validation enforcing OCI digest format (algorithm:hex), which means invalid digests can pass validation and lead to downstream issues.

Fix

  • Added validation using digest.Digest.Validate()
  • Each entry in modelfs.diffIds is now checked for correct format

Tests

  • Added negative tests for invalid digest formats
  • Added positive tests for valid digests

Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Copilot AI review requested due to automatic review settings March 22, 2026 10:33
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the schema validation process by introducing strict checks for OCI digest formats within the modelfs.diffIds field, which was previously accepting arbitrary strings. It also corrects the parsing logic in validateConfig to ensure the entire model structure is properly validated, thereby improving data integrity and preventing potential downstream issues caused by malformed or incomplete model configurations.

Highlights

  • OCI Digest Validation: Implemented validation for the OCI digest format within modelfs.diffIds to ensure that all entries conform to the algorithm:hex standard, preventing invalid digests from being processed.
  • Model Structure Parsing: Refactored the validateConfig function to correctly unmarshal and validate the entire v1.Model structure, rather than just v1.ModelConfig, addressing a previous limitation where partial models could pass validation.
  • Comprehensive Testing: Added new test cases to cover both valid and invalid scenarios for OCI digest formats and to verify the correct parsing and validation of the full model structure.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@rishi-jat
Copy link
Author

/cc @chlins
/cc @aftersnow
/cc @bergwolf

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens validation for the application/vnd.cncf.model.config.v1+json schema by adding explicit OCI digest-format checks for modelfs.diffIds, preventing invalid digest strings from passing schema validation.

Changes:

  • Updated validateConfig to unmarshal the full v1.Model and validate each modelfs.diffIds entry via digest.Digest.Validate().
  • Added tests covering valid and invalid OCI digest formats, plus coverage around validating full-model structure inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
schema/validator.go Adds pre-schema validation for modelfs.diffIds digest format (and additional structural checks).
schema/config_test.go Adds new positive/negative tests for digest validation and model-structure validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +125 to +127
if model.Descriptor.Name == "" {
return fmt.Errorf("missing descriptor.name")
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new model.Descriptor.Name == "" check makes descriptor.name effectively required, but the JSON schema allows an empty descriptor object (no required list for name, only minLength if present). This introduces a stricter validation rule than the schema/spec and can reject inputs that would otherwise pass schema validation. Either drop this check, or update the schema/spec to require descriptor.name and adjust tests accordingly.

Suggested change
if model.Descriptor.Name == "" {
return fmt.Errorf("missing descriptor.name")
}

Copilot uses AI. Check for mistakes.
Comment on lines +593 to +606
func TestValidateConfigParsesModelNotModelConfig(t *testing.T) {
// This test verifies that validateConfig correctly parses the full Model structure,
// not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig,
// which always succeeded because all fields are optional.

// Test 1: Incomplete model with only config (should fail)
invalidJSON := `{
"config": {"paramSize": "8b"}
}`

err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidJSON))
if err == nil {
t.Fatalf("expected validation to fail for incomplete model")
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestValidateConfigParsesModelNotModelConfig is described as verifying the switch from unmarshaling ModelConfig to Model, but the assertions only check err != nil for invalid inputs. This test would still pass even if validateConfig unmarshaled into ModelConfig, because schema validation already rejects those payloads. Consider asserting on the specific failure reason (e.g., the pre-validation error) so the test actually protects the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly implements validation for OCI digest formats in modelfs.diffIds, which was previously missing. The accompanying tests are thorough and cover both valid and invalid cases. My feedback focuses on improving the maintainability of the new code by refactoring the tests to be table-driven and removing some redundant validation logic that is already handled by the JSON schema.

Comment on lines +593 to +630
func TestValidateConfigParsesModelNotModelConfig(t *testing.T) {
// This test verifies that validateConfig correctly parses the full Model structure,
// not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig,
// which always succeeded because all fields are optional.

// Test 1: Incomplete model with only config (should fail)
invalidJSON := `{
"config": {"paramSize": "8b"}
}`

err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidJSON))
if err == nil {
t.Fatalf("expected validation to fail for incomplete model")
}

// Test 2: Config-only JSON (should fail)
configOnlyJSON := `{
"paramSize": "8b",
"architecture": "transformer"
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(configOnlyJSON))
if err == nil {
t.Fatalf("expected failure for config-only JSON without descriptor/modelfs, but got nil")
}

// Test 3: Valid full Model (should pass)
validJSON := `{
"descriptor": {"name": "test-model"},
"config": {"paramSize": "8b"},
"modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(validJSON))
if err != nil {
t.Fatalf("expected valid Model to pass, but got error: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test function contains several related test cases. To improve readability and maintainability, consider refactoring it into a single table-driven test. This is a common Go idiom that makes it easier to add, remove, or modify test cases in the future.

func TestValidateConfigParsesModelNotModelConfig(t *testing.T) {
	// This test verifies that validateConfig correctly parses the full Model structure,
	// not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig,
	// which always succeeded because all fields are optional.
	testCases := []struct {
		name       string
		jsonInput  string
		shouldFail bool
	}{
		{
			name:       "Incomplete model with only config",
			jsonInput:  `{"config": {"paramSize": "8b"}}`,
			shouldFail: true,
		},
		{
			name:       "Config-only JSON",
			jsonInput:  `{"paramSize": "8b", "architecture": "transformer"}`,
			shouldFail: true,
		},
		{
			name:       "Valid full Model",
			jsonInput:  `{"descriptor": {"name": "test-model"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]}}`,
			shouldFail: false,
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(tc.jsonInput))
			if tc.shouldFail {
				if err == nil {
					t.Fatalf("expected validation to fail for '%s', but it passed", tc.name)
				}
			} else {
				if err != nil {
					t.Fatalf("expected validation to pass for '%s', but it failed with: %v", tc.name, err)
				}
			}
		})
	}
}

Comment on lines +632 to +727
func TestValidateDigestFormat(t *testing.T) {
// Test that validateConfig rejects invalid OCI digest formats

// Test 1: Invalid digest format (no algorithm:hex format)
invalidDigestJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": ["invalid-digest"]
}
}`

err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidDigestJSON))
if err == nil {
t.Fatalf("expected failure for invalid digest format")
}

// Test 2: Invalid hex in digest
invalidHexJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": ["sha256:xyz"]
}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidHexJSON))
if err == nil {
t.Fatalf("expected failure for invalid hex in digest")
}

// Test 3: Empty hash
emptyHashJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": ["sha256:"]
}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(emptyHashJSON))
if err == nil {
t.Fatalf("expected failure for empty hash in digest")
}

// Test 4: Multiple invalid digests
multipleInvalidJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": ["invalid", "also-invalid"]
}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(multipleInvalidJSON))
if err == nil {
t.Fatalf("expected failure for multiple invalid digests")
}

// Test 5: Valid digest (should pass)
validJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]
}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(validJSON))
if err != nil {
t.Fatalf("expected valid digest to pass, got: %v", err)
}

// Test 6: Multiple valid digests (should pass)
multipleValidJSON := `{
"descriptor": {"name": "test"},
"config": {"paramSize": "8b"},
"modelfs": {
"type": "layers",
"diffIds": [
"sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef",
"sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890"
]
}
}`

err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(multipleValidJSON))
if err != nil {
t.Fatalf("expected multiple valid digests to pass, got: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test function contains several similar test cases. To improve readability and maintainability, consider refactoring it into a single table-driven test. This is a common Go idiom that makes it easier to add, remove, or modify test cases in the future.

func TestValidateDigestFormat(t *testing.T) {
	// Test that validateConfig rejects invalid OCI digest formats
	testCases := []struct {
		name       string
		jsonInput  string
		shouldFail bool
	}{
		{
			name:       "Invalid digest format (no algorithm:hex format)",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["invalid-digest"]}}`,
			shouldFail: true,
		},
		{
			name:       "Invalid hex in digest",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:xyz"]}}`,
			shouldFail: true,
		},
		{
			name:       "Empty hash",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:"]}}`,
			shouldFail: true,
		},
		{
			name:       "Multiple invalid digests",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["invalid", "also-invalid"]}}`,
			shouldFail: true,
		},
		{
			name:       "Valid digest",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]}}`,
			shouldFail: false,
		},
		{
			name:       "Multiple valid digests",
			jsonInput:  `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", "sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890"]}}`,
			shouldFail: false,
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(tc.jsonInput))
			if tc.shouldFail {
				if err == nil {
					t.Fatalf("expected failure for '%s'", tc.name)
				}
			} else {
				if err != nil {
					t.Fatalf("expected valid case '%s' to pass, got: %v", tc.name, err)
				}
			}
		})
	}
}

Comment on lines +124 to +131
// Minimal structural validation for required fields
if model.Descriptor.Name == "" {
return fmt.Errorf("missing descriptor.name")
}
if len(model.ModelFS.DiffIDs) == 0 {
return fmt.Errorf("missing modelfs.diffIds")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These manual checks for descriptor.name and modelfs.diffIds are redundant. The JSON schema (config-schema.json) already enforces that descriptor.name is non-empty (minLength: 1) and modelfs.diffIds is not empty (minItems: 1). Since validateSchema is called after this function, these checks are duplicated.

To improve maintainability and avoid logic duplication, it's best to remove these manual checks and rely on the schema validation. This function should only contain validation logic that cannot be expressed in the JSON schema, like the digest format check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants