Skip to content

fix(api): cap media binary metadata reads#843

Merged
wizzomafizzo merged 2 commits into
mainfrom
fix/media-binary-read-caps
May 25, 2026
Merged

fix(api): cap media binary metadata reads#843
wizzomafizzo merged 2 commits into
mainfrom
fix/media-binary-read-caps

Conversation

@wizzomafizzo
Copy link
Copy Markdown
Member

@wizzomafizzo wizzomafizzo commented May 24, 2026

Summary

  • cap media.image filesystem reads and DB blob responses at 16MB decoded bytes
  • preserve media.meta inline data for normal binary blobs while omitting only oversized blobs
  • record blob sizes so oversized metadata can be handled safely without loading full payloads

Validation

  • go test ./pkg/api/methods/ -run 'TestHandleMedia(Image|Meta)'\n- go test ./pkg/api/ -run 'TestLogSafeResponse|TestHandleResponse'\n- go test ./pkg/database/mediadb/ -run 'Test.*Properties|Test.*Blob|TestApplyScrapeResult'\n- golangci-lint run ./pkg/api/... ./pkg/database/...

Summary by CodeRabbit

  • New Features
    • Enforced 16MB limit for media image files and stored binaries; oversized items report size but omit payloads.
  • Bug Fixes
    • Improved defensive validation for file reads; oversized or malformed disk-backed images now return clear client-facing errors.
  • Tests
    • Added tests covering oversized file and oversized stored-blob handling; metadata behavior validated when binaries are omitted.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR enforces a 16 MB in-memory cap for media binaries: the database reports BlobSize and omits Binary when oversized; API image selection validates disk files and stored blobs and returns client errors for oversized content.

Changes

Media Binary Size Capping

Layer / File(s) Summary
Database schema and size constant
pkg/database/database.go
Adds MaxMediaPropertyBinaryBytes constant (16 MB) and BlobSize field to MediaProperty struct, establishing the contract for blob size reporting and read-only field semantics across database and API layers.
SQL infrastructure: query builders
pkg/database/mediadb/sql_scraper.go
Introduces propertyGroupMode enum and shared query builders (propertySelectColumns, mediaTitlePropertyQuery, mediaPropertyQuery) that enforce blob byte capping using conditional CASE logic and compute blob size for result scanning.
SQL infrastructure: result scanning
pkg/database/mediadb/sql_scraper.go
Updates scanProperties and scanPropertyWithDBID to scan blob size and use setPropertyBlobFields helper, centralizing logic to populate BlobDBID, ContentType, BlobSize, and conditionally hydrate Binary based on oversized check.
Database accessor methods
pkg/database/mediadb/sql_scraper.go
Refactors GetMediaTitleProperties, GetMediaTitlePropertiesByMediaTitleDBIDs, GetMediaProperties, and GetMediaPropertiesByMediaDBIDs to delegate to load* helpers using shared query builders and result scanning, enabling consistent blob capping across grouped and ungrouped queries.
Database tests: blob capping validation
pkg/database/mediadb/sql_scraper_test.go
Validates BlobSize population and oversized blob detection: TestUpsertMediaProperties_WithBlob confirms BlobSize is set and Binary contains data; TestGetMediaProperties_OversizedBlobOmitsBinary confirms oversized blobs report size but omit Binary.
Image API: error types and validation helpers
pkg/api/methods/media_image.go
Adds mediaBinaryTooLargeError type and helpers (readMediaBinaryFile, mediaImageReadError, propertyBlobTooLarge, mediaImageBlobTooLargeError) to validate disk files and stored blobs, enforcing size limits and translating violations to client-facing errors.
Image API: selectMediaImage implementation
pkg/api/methods/media_image.go
Integrates size validation into selectMediaImage: rejects oversized blob metadata before disk reads, replaces os.ReadFile with bounded readMediaBinaryFile, and uses errors.Is(..., os.ErrNotExist) for file detection in both media-level and title-level property fallbacks.
Image API tests: oversized content rejection
pkg/api/methods/media_image_test.go
Tests oversized file and blob handling: TestHandleMediaImage_OversizedFileReturnsClientError verifies on-disk oversized files are rejected with "image file too large", and TestHandleMediaImage_OversizedBlobReturnsClientError verifies oversized stored blobs are rejected with "image blob too large".
Media metadata API tests: oversized content handling
pkg/api/methods/media_meta_test.go
Updates metadata response handling: TestHandleMediaMeta_FullResult fixture now includes BlobSize, and TestHandleMediaMeta_OversizedBinaryPropertyDataOmitted confirms HandleMediaMeta omits binary data for oversized properties while preserving ContentType and Extension.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HandleMediaImage
  participant Database
  participant Disk
  participant ClientError
  Client->>HandleMediaImage: Request image
  HandleMediaImage->>Database: GetMediaProperties / GetMediaTitleProperties
  Database-->>HandleMediaImage: Property (BlobSize, Binary or BlobDBID/Text)
  alt BlobSize exceeds MaxMediaPropertyBinaryBytes
    HandleMediaImage->>ClientError: return "image blob too large"
    ClientError-->>Client: error
  else Binary present and within limit
    HandleMediaImage-->>Client: return Binary
  else Binary empty, Text path present
    HandleMediaImage->>Disk: read file (readMediaBinaryFile)
    alt file size exceeds MaxMediaPropertyBinaryBytes
      HandleMediaImage->>ClientError: return "image file too large"
      ClientError-->>Client: error
    else file undersized
      HandleMediaImage-->>Client: return file content
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ZaparooProject/zaparoo-core#770: Modifies HandleMediaImage in pkg/api/methods/media_image.go around batch request and image selection logic; related to changes in image selection and property handling.

Poem

🐰 A rabbit nibbles bytes with care,

Sixteen megabytes — a prudent share.
Blobs and files now measured true,
Small hops return the bytes to you. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'cap media binary metadata reads' directly describes the main objective of the PR: implementing a 16 MB size limit on media binary reads.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/media-binary-read-caps

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/database/mediadb/sql_scraper_test.go (1)

930-956: ⚡ Quick win

Add max-size boundary coverage and assert metadata retention.

Great oversized-path coverage. Please also add == database.MaxMediaPropertyBinaryBytes coverage (inclusive boundary) and assert ContentType is still returned when Binary is omitted.

🧪 Suggested patch
 func TestGetMediaProperties_OversizedBlobOmitsBinary(t *testing.T) {
@@
 	require.Len(t, got, 1)
 	require.NotNil(t, got[0].BlobDBID)
 	assert.Equal(t, blobDBID, *got[0].BlobDBID)
 	assert.Equal(t, largeSize, got[0].BlobSize)
+	assert.Equal(t, "image/png", got[0].ContentType)
 	assert.Nil(t, got[0].Binary)
 }
+
+func TestGetMediaProperties_MaxSizeBlobKeepsBinary(t *testing.T) {
+	t.Parallel()
+	mediaDB, cleanup := setupScraperTestDB(t)
+	defer cleanup()
+	ctx := context.Background()
+
+	blobDBID, err := mediaDB.UpsertMediaBlob(ctx, "image/png", []byte("small"))
+	require.NoError(t, err)
+
+	exactSize := database.MaxMediaPropertyBinaryBytes
+	res, err := mediaDB.sql.ExecContext(ctx,
+		`UPDATE MediaBlobs SET Data = zeroblob(?) WHERE DBID = ?`, exactSize, blobDBID)
+	require.NoError(t, err)
+	rowsAffected, err := res.RowsAffected()
+	require.NoError(t, err)
+	require.Equal(t, int64(1), rowsAffected)
+
+	require.NoError(t, mediaDB.UpsertMediaProperties(ctx, 1, []database.MediaProperty{
+		{TypeTag: "property:image-boxart", BlobDBID: &blobDBID},
+	}))
+
+	got, err := mediaDB.GetMediaProperties(ctx, 1)
+	require.NoError(t, err)
+	require.Len(t, got, 1)
+	require.NotNil(t, got[0].BlobDBID)
+	assert.Equal(t, blobDBID, *got[0].BlobDBID)
+	assert.Equal(t, exactSize, got[0].BlobSize)
+	assert.Equal(t, "image/png", got[0].ContentType)
+	require.NotNil(t, got[0].Binary)
+	assert.Len(t, got[0].Binary, int(exactSize))
+}

As per coding guidelines, **/*.go: Write tests for all new code — see TESTING.md and pkg/testing/README.md.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/sql_scraper_test.go` around lines 930 - 956, Add a
companion test to cover the inclusive boundary at
database.MaxMediaPropertyBinaryBytes: create a blob via mediaDB.UpsertMediaBlob,
set its Data to zeroblob(?) with size == database.MaxMediaPropertyBinaryBytes,
UpsertMediaProperties referencing that BlobDBID (and a non-empty ContentType if
applicable), then call mediaDB.GetMediaProperties and assert the property still
omits Binary, has BlobSize == database.MaxMediaPropertyBinaryBytes, keeps
BlobDBID, and also asserts ContentType (or the corresponding metadata field
returned by GetMediaProperties) is present and equals the expected value; mirror
the structure and helpers used in
TestGetMediaProperties_OversizedBlobOmitsBinary to locate where to add this new
test.
pkg/database/mediadb/sql_scraper.go (1)

1247-1266: 💤 Low value

Consider qualifying column references for clarity.

The unqualified column names (TypeTagDBID, Text, BlobDBID, ContentType) work because they're unique across the joined tables. However, explicitly qualifying them (e.g., mtp.TypeTagDBID, mb.ContentType) would improve readability and prevent subtle bugs if schema changes introduce column name collisions.

♻️ Suggested diff
 func propertySelectColumns(entityIDColumn string, groupMode propertyGroupMode) string {
 	parts := []string{}
 	if groupMode == propertyGroupInclude {
 		parts = append(parts, entityIDColumn)
 	}
 	dataColumn := fmt.Sprintf(
 		"CASE WHEN mb.Data IS NOT NULL AND length(mb.Data) <= %d THEN mb.Data ELSE NULL END",
 		database.MaxMediaPropertyBinaryBytes,
 	)
+	// Note: These columns must be qualified in the caller's context:
+	// - For mediaTitlePropertyQuery: mtp.TypeTagDBID, mtp.Text, mtp.BlobDBID
+	// - For mediaPropertyQuery: mp.TypeTagDBID, mp.Text, mp.BlobDBID
+	// ContentType always comes from mb (MediaBlobs).
 	parts = append(parts,
 		"tt.Type || ':' || t.Tag",
-		"TypeTagDBID",
-		"Text",
-		"BlobDBID",
-		"ContentType",
+		"%s.TypeTagDBID",
+		"%s.Text",
+		"%s.BlobDBID",
+		"mb.ContentType",
 		"CASE WHEN mb.Data IS NOT NULL THEN length(mb.Data) ELSE NULL END",
 		dataColumn,
 	)
 	return strings.Join(parts, ", ")
 }

This would require updating the function signature to accept a table alias parameter, or splitting into table-specific helpers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/sql_scraper.go` around lines 1247 - 1266, The
propertySelectColumns function uses unqualified column names (TypeTagDBID, Text,
BlobDBID, ContentType) which should be qualified to avoid future collisions;
update propertySelectColumns to accept a table-alias parameter (or separate
helpers) and change those entries to use the correct aliases (e.g.,
mtp.TypeTagDBID, mtp.Text or mb.BlobDBID, mb.ContentType as appropriate to how
joins are built alongside existing mb.Data and tt/t.Tag usage) so all columns
are explicitly prefixed with their table alias.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/database/mediadb/sql_scraper_test.go`:
- Around line 930-956: Add a companion test to cover the inclusive boundary at
database.MaxMediaPropertyBinaryBytes: create a blob via mediaDB.UpsertMediaBlob,
set its Data to zeroblob(?) with size == database.MaxMediaPropertyBinaryBytes,
UpsertMediaProperties referencing that BlobDBID (and a non-empty ContentType if
applicable), then call mediaDB.GetMediaProperties and assert the property still
omits Binary, has BlobSize == database.MaxMediaPropertyBinaryBytes, keeps
BlobDBID, and also asserts ContentType (or the corresponding metadata field
returned by GetMediaProperties) is present and equals the expected value; mirror
the structure and helpers used in
TestGetMediaProperties_OversizedBlobOmitsBinary to locate where to add this new
test.

In `@pkg/database/mediadb/sql_scraper.go`:
- Around line 1247-1266: The propertySelectColumns function uses unqualified
column names (TypeTagDBID, Text, BlobDBID, ContentType) which should be
qualified to avoid future collisions; update propertySelectColumns to accept a
table-alias parameter (or separate helpers) and change those entries to use the
correct aliases (e.g., mtp.TypeTagDBID, mtp.Text or mb.BlobDBID, mb.ContentType
as appropriate to how joins are built alongside existing mb.Data and tt/t.Tag
usage) so all columns are explicitly prefixed with their table alias.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 669c69f2-0fd1-4ab1-a890-a7c87567d4ca

📥 Commits

Reviewing files that changed from the base of the PR and between c7a2dfd and c600007.

📒 Files selected for processing (6)
  • pkg/api/methods/media_image.go
  • pkg/api/methods/media_image_test.go
  • pkg/api/methods/media_meta_test.go
  • pkg/database/database.go
  • pkg/database/mediadb/sql_scraper.go
  • pkg/database/mediadb/sql_scraper_test.go

@sentry
Copy link
Copy Markdown

sentry Bot commented May 24, 2026

Codecov Report

❌ Patch coverage is 84.46602% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/api/methods/media_image.go 63.63% 10 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/api/methods/media_image.go (1)

286-338: 💤 Low value

Consider extracting property binary resolution into a helper to reduce fallback path duplication.

The oversized-blob check (propertyBlobTooLarge) and disk-read path (readMediaBinaryFile) appear twice: once in the main flow (lines 286-294) and again in the media→title fallback (lines 319-327). This duplication increases maintenance burden if validation logic changes.

A helper like resolvePropertyBinary(prop, maxBytes) ([]byte, error) could encapsulate both blob-size validation and file reading, simplifying both paths and the stale-property cleanup branches.

Given the bounded scope and explicit control flow benefits, this is optional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/methods/media_image.go` around lines 286 - 338, The duplicate logic
that checks propertyBlobTooLarge and reads disk via readMediaBinaryFile appears
in two places while resolving prop.Binary (in the media path and the media→title
fallback); extract this into a helper like resolvePropertyBinary(prop, maxBytes)
([]byte, error) and replace both duplicated blocks with calls to it, keeping
existing callers that handle os.ErrNotExist by returning the same
mediaImageReadError or performing stale-property cleanup
(db.DeleteMediaTitleProperty, db.DeleteMediaProperty) and map updates (titleMap,
mediaMap) as before; ensure resolvePropertyBinary performs the
propertyBlobTooLarge check and attempts readMediaBinaryFile(prop.Text, maxBytes)
when prop.Binary is empty, returning the binary or a wrapped error suitable for
the existing error handling in the surrounding function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/methods/media_image.go`:
- Around line 286-338: The duplicate logic that checks propertyBlobTooLarge and
reads disk via readMediaBinaryFile appears in two places while resolving
prop.Binary (in the media path and the media→title fallback); extract this into
a helper like resolvePropertyBinary(prop, maxBytes) ([]byte, error) and replace
both duplicated blocks with calls to it, keeping existing callers that handle
os.ErrNotExist by returning the same mediaImageReadError or performing
stale-property cleanup (db.DeleteMediaTitleProperty, db.DeleteMediaProperty) and
map updates (titleMap, mediaMap) as before; ensure resolvePropertyBinary
performs the propertyBlobTooLarge check and attempts
readMediaBinaryFile(prop.Text, maxBytes) when prop.Binary is empty, returning
the binary or a wrapped error suitable for the existing error handling in the
surrounding function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02991861-ca08-46ae-9ac6-5bdac75bff45

📥 Commits

Reviewing files that changed from the base of the PR and between c600007 and 00ecdd9.

📒 Files selected for processing (4)
  • pkg/api/methods/media_image.go
  • pkg/database/database.go
  • pkg/database/mediadb/sql_scraper.go
  • pkg/database/mediadb/sql_scraper_test.go

@wizzomafizzo wizzomafizzo merged commit 0310c1a into main May 25, 2026
12 checks passed
@wizzomafizzo wizzomafizzo deleted the fix/media-binary-read-caps branch May 25, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant