fix(schema): validate OCI digest format for modelfs.diffIds#192
fix(schema): validate OCI digest format for modelfs.diffIds#192rishi-jat wants to merge 2 commits intomodelpack:mainfrom
Conversation
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Signed-off-by: Rishi Jat <rishijat098@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the schema validation process by introducing strict checks for OCI digest formats within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
/cc @chlins |
There was a problem hiding this comment.
Pull request overview
This PR strengthens validation for the application/vnd.cncf.model.config.v1+json schema by adding explicit OCI digest-format checks for modelfs.diffIds, preventing invalid digest strings from passing schema validation.
Changes:
- Updated
validateConfigto unmarshal the fullv1.Modeland validate eachmodelfs.diffIdsentry viadigest.Digest.Validate(). - Added tests covering valid and invalid OCI digest formats, plus coverage around validating full-model structure inputs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| schema/validator.go | Adds pre-schema validation for modelfs.diffIds digest format (and additional structural checks). |
| schema/config_test.go | Adds new positive/negative tests for digest validation and model-structure validation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if model.Descriptor.Name == "" { | ||
| return fmt.Errorf("missing descriptor.name") | ||
| } |
There was a problem hiding this comment.
The new model.Descriptor.Name == "" check makes descriptor.name effectively required, but the JSON schema allows an empty descriptor object (no required list for name, only minLength if present). This introduces a stricter validation rule than the schema/spec and can reject inputs that would otherwise pass schema validation. Either drop this check, or update the schema/spec to require descriptor.name and adjust tests accordingly.
| if model.Descriptor.Name == "" { | |
| return fmt.Errorf("missing descriptor.name") | |
| } |
| func TestValidateConfigParsesModelNotModelConfig(t *testing.T) { | ||
| // This test verifies that validateConfig correctly parses the full Model structure, | ||
| // not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig, | ||
| // which always succeeded because all fields are optional. | ||
|
|
||
| // Test 1: Incomplete model with only config (should fail) | ||
| invalidJSON := `{ | ||
| "config": {"paramSize": "8b"} | ||
| }` | ||
|
|
||
| err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected validation to fail for incomplete model") | ||
| } |
There was a problem hiding this comment.
TestValidateConfigParsesModelNotModelConfig is described as verifying the switch from unmarshaling ModelConfig to Model, but the assertions only check err != nil for invalid inputs. This test would still pass even if validateConfig unmarshaled into ModelConfig, because schema validation already rejects those payloads. Consider asserting on the specific failure reason (e.g., the pre-validation error) so the test actually protects the intended behavior.
There was a problem hiding this comment.
Code Review
This pull request correctly implements validation for OCI digest formats in modelfs.diffIds, which was previously missing. The accompanying tests are thorough and cover both valid and invalid cases. My feedback focuses on improving the maintainability of the new code by refactoring the tests to be table-driven and removing some redundant validation logic that is already handled by the JSON schema.
| func TestValidateConfigParsesModelNotModelConfig(t *testing.T) { | ||
| // This test verifies that validateConfig correctly parses the full Model structure, | ||
| // not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig, | ||
| // which always succeeded because all fields are optional. | ||
|
|
||
| // Test 1: Incomplete model with only config (should fail) | ||
| invalidJSON := `{ | ||
| "config": {"paramSize": "8b"} | ||
| }` | ||
|
|
||
| err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected validation to fail for incomplete model") | ||
| } | ||
|
|
||
| // Test 2: Config-only JSON (should fail) | ||
| configOnlyJSON := `{ | ||
| "paramSize": "8b", | ||
| "architecture": "transformer" | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(configOnlyJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected failure for config-only JSON without descriptor/modelfs, but got nil") | ||
| } | ||
|
|
||
| // Test 3: Valid full Model (should pass) | ||
| validJSON := `{ | ||
| "descriptor": {"name": "test-model"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]} | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(validJSON)) | ||
| if err != nil { | ||
| t.Fatalf("expected valid Model to pass, but got error: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
This test function contains several related test cases. To improve readability and maintainability, consider refactoring it into a single table-driven test. This is a common Go idiom that makes it easier to add, remove, or modify test cases in the future.
func TestValidateConfigParsesModelNotModelConfig(t *testing.T) {
// This test verifies that validateConfig correctly parses the full Model structure,
// not just ModelConfig. Previously, validateConfig unmarshaled into ModelConfig,
// which always succeeded because all fields are optional.
testCases := []struct {
name string
jsonInput string
shouldFail bool
}{
{
name: "Incomplete model with only config",
jsonInput: `{"config": {"paramSize": "8b"}}`,
shouldFail: true,
},
{
name: "Config-only JSON",
jsonInput: `{"paramSize": "8b", "architecture": "transformer"}`,
shouldFail: true,
},
{
name: "Valid full Model",
jsonInput: `{"descriptor": {"name": "test-model"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]}}`,
shouldFail: false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(tc.jsonInput))
if tc.shouldFail {
if err == nil {
t.Fatalf("expected validation to fail for '%s', but it passed", tc.name)
}
} else {
if err != nil {
t.Fatalf("expected validation to pass for '%s', but it failed with: %v", tc.name, err)
}
}
})
}
}| func TestValidateDigestFormat(t *testing.T) { | ||
| // Test that validateConfig rejects invalid OCI digest formats | ||
|
|
||
| // Test 1: Invalid digest format (no algorithm:hex format) | ||
| invalidDigestJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": ["invalid-digest"] | ||
| } | ||
| }` | ||
|
|
||
| err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidDigestJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected failure for invalid digest format") | ||
| } | ||
|
|
||
| // Test 2: Invalid hex in digest | ||
| invalidHexJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": ["sha256:xyz"] | ||
| } | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(invalidHexJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected failure for invalid hex in digest") | ||
| } | ||
|
|
||
| // Test 3: Empty hash | ||
| emptyHashJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": ["sha256:"] | ||
| } | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(emptyHashJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected failure for empty hash in digest") | ||
| } | ||
|
|
||
| // Test 4: Multiple invalid digests | ||
| multipleInvalidJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": ["invalid", "also-invalid"] | ||
| } | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(multipleInvalidJSON)) | ||
| if err == nil { | ||
| t.Fatalf("expected failure for multiple invalid digests") | ||
| } | ||
|
|
||
| // Test 5: Valid digest (should pass) | ||
| validJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"] | ||
| } | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(validJSON)) | ||
| if err != nil { | ||
| t.Fatalf("expected valid digest to pass, got: %v", err) | ||
| } | ||
|
|
||
| // Test 6: Multiple valid digests (should pass) | ||
| multipleValidJSON := `{ | ||
| "descriptor": {"name": "test"}, | ||
| "config": {"paramSize": "8b"}, | ||
| "modelfs": { | ||
| "type": "layers", | ||
| "diffIds": [ | ||
| "sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", | ||
| "sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890" | ||
| ] | ||
| } | ||
| }` | ||
|
|
||
| err = schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(multipleValidJSON)) | ||
| if err != nil { | ||
| t.Fatalf("expected multiple valid digests to pass, got: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
This test function contains several similar test cases. To improve readability and maintainability, consider refactoring it into a single table-driven test. This is a common Go idiom that makes it easier to add, remove, or modify test cases in the future.
func TestValidateDigestFormat(t *testing.T) {
// Test that validateConfig rejects invalid OCI digest formats
testCases := []struct {
name string
jsonInput string
shouldFail bool
}{
{
name: "Invalid digest format (no algorithm:hex format)",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["invalid-digest"]}}`,
shouldFail: true,
},
{
name: "Invalid hex in digest",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:xyz"]}}`,
shouldFail: true,
},
{
name: "Empty hash",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:"]}}`,
shouldFail: true,
},
{
name: "Multiple invalid digests",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["invalid", "also-invalid"]}}`,
shouldFail: true,
},
{
name: "Valid digest",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef"]}}`,
shouldFail: false,
},
{
name: "Multiple valid digests",
jsonInput: `{"descriptor": {"name": "test"}, "config": {"paramSize": "8b"}, "modelfs": {"type": "layers", "diffIds": ["sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", "sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890"]}}`,
shouldFail: false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
err := schema.ValidatorMediaTypeModelConfig.Validate(strings.NewReader(tc.jsonInput))
if tc.shouldFail {
if err == nil {
t.Fatalf("expected failure for '%s'", tc.name)
}
} else {
if err != nil {
t.Fatalf("expected valid case '%s' to pass, got: %v", tc.name, err)
}
}
})
}
}| // Minimal structural validation for required fields | ||
| if model.Descriptor.Name == "" { | ||
| return fmt.Errorf("missing descriptor.name") | ||
| } | ||
| if len(model.ModelFS.DiffIDs) == 0 { | ||
| return fmt.Errorf("missing modelfs.diffIds") | ||
| } | ||
|
|
There was a problem hiding this comment.
These manual checks for descriptor.name and modelfs.diffIds are redundant. The JSON schema (config-schema.json) already enforces that descriptor.name is non-empty (minLength: 1) and modelfs.diffIds is not empty (minItems: 1). Since validateSchema is called after this function, these checks are duplicated.
To improve maintainability and avoid logic duplication, it's best to remove these manual checks and rely on the schema validation. This function should only contain validation logic that cannot be expressed in the JSON schema, like the digest format check.
Problem
modelfs.diffIdscurrently accepts arbitrary strings.There is no validation enforcing OCI digest format (
algorithm:hex), which means invalid digests can pass validation and lead to downstream issues.Fix
digest.Digest.Validate()modelfs.diffIdsis now checked for correct formatTests