fix(api): cap media binary metadata reads#843
Conversation
📝 WalkthroughWalkthroughThis PR enforces a 16 MB in-memory cap for media binaries: the database reports BlobSize and omits Binary when oversized; API image selection validates disk files and stored blobs and returns client errors for oversized content. ChangesMedia Binary Size Capping
Sequence Diagram(s)sequenceDiagram
participant Client
participant HandleMediaImage
participant Database
participant Disk
participant ClientError
Client->>HandleMediaImage: Request image
HandleMediaImage->>Database: GetMediaProperties / GetMediaTitleProperties
Database-->>HandleMediaImage: Property (BlobSize, Binary or BlobDBID/Text)
alt BlobSize exceeds MaxMediaPropertyBinaryBytes
HandleMediaImage->>ClientError: return "image blob too large"
ClientError-->>Client: error
else Binary present and within limit
HandleMediaImage-->>Client: return Binary
else Binary empty, Text path present
HandleMediaImage->>Disk: read file (readMediaBinaryFile)
alt file size exceeds MaxMediaPropertyBinaryBytes
HandleMediaImage->>ClientError: return "image file too large"
ClientError-->>Client: error
else file undersized
HandleMediaImage-->>Client: return file content
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/database/mediadb/sql_scraper_test.go (1)
930-956: ⚡ Quick winAdd max-size boundary coverage and assert metadata retention.
Great oversized-path coverage. Please also add
== database.MaxMediaPropertyBinaryBytescoverage (inclusive boundary) and assertContentTypeis still returned whenBinaryis omitted.🧪 Suggested patch
func TestGetMediaProperties_OversizedBlobOmitsBinary(t *testing.T) { @@ require.Len(t, got, 1) require.NotNil(t, got[0].BlobDBID) assert.Equal(t, blobDBID, *got[0].BlobDBID) assert.Equal(t, largeSize, got[0].BlobSize) + assert.Equal(t, "image/png", got[0].ContentType) assert.Nil(t, got[0].Binary) } + +func TestGetMediaProperties_MaxSizeBlobKeepsBinary(t *testing.T) { + t.Parallel() + mediaDB, cleanup := setupScraperTestDB(t) + defer cleanup() + ctx := context.Background() + + blobDBID, err := mediaDB.UpsertMediaBlob(ctx, "image/png", []byte("small")) + require.NoError(t, err) + + exactSize := database.MaxMediaPropertyBinaryBytes + res, err := mediaDB.sql.ExecContext(ctx, + `UPDATE MediaBlobs SET Data = zeroblob(?) WHERE DBID = ?`, exactSize, blobDBID) + require.NoError(t, err) + rowsAffected, err := res.RowsAffected() + require.NoError(t, err) + require.Equal(t, int64(1), rowsAffected) + + require.NoError(t, mediaDB.UpsertMediaProperties(ctx, 1, []database.MediaProperty{ + {TypeTag: "property:image-boxart", BlobDBID: &blobDBID}, + })) + + got, err := mediaDB.GetMediaProperties(ctx, 1) + require.NoError(t, err) + require.Len(t, got, 1) + require.NotNil(t, got[0].BlobDBID) + assert.Equal(t, blobDBID, *got[0].BlobDBID) + assert.Equal(t, exactSize, got[0].BlobSize) + assert.Equal(t, "image/png", got[0].ContentType) + require.NotNil(t, got[0].Binary) + assert.Len(t, got[0].Binary, int(exactSize)) +}As per coding guidelines,
**/*.go: Write tests for all new code — see TESTING.md and pkg/testing/README.md.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_scraper_test.go` around lines 930 - 956, Add a companion test to cover the inclusive boundary at database.MaxMediaPropertyBinaryBytes: create a blob via mediaDB.UpsertMediaBlob, set its Data to zeroblob(?) with size == database.MaxMediaPropertyBinaryBytes, UpsertMediaProperties referencing that BlobDBID (and a non-empty ContentType if applicable), then call mediaDB.GetMediaProperties and assert the property still omits Binary, has BlobSize == database.MaxMediaPropertyBinaryBytes, keeps BlobDBID, and also asserts ContentType (or the corresponding metadata field returned by GetMediaProperties) is present and equals the expected value; mirror the structure and helpers used in TestGetMediaProperties_OversizedBlobOmitsBinary to locate where to add this new test.pkg/database/mediadb/sql_scraper.go (1)
1247-1266: 💤 Low valueConsider qualifying column references for clarity.
The unqualified column names (
TypeTagDBID,Text,BlobDBID,ContentType) work because they're unique across the joined tables. However, explicitly qualifying them (e.g.,mtp.TypeTagDBID,mb.ContentType) would improve readability and prevent subtle bugs if schema changes introduce column name collisions.♻️ Suggested diff
func propertySelectColumns(entityIDColumn string, groupMode propertyGroupMode) string { parts := []string{} if groupMode == propertyGroupInclude { parts = append(parts, entityIDColumn) } dataColumn := fmt.Sprintf( "CASE WHEN mb.Data IS NOT NULL AND length(mb.Data) <= %d THEN mb.Data ELSE NULL END", database.MaxMediaPropertyBinaryBytes, ) + // Note: These columns must be qualified in the caller's context: + // - For mediaTitlePropertyQuery: mtp.TypeTagDBID, mtp.Text, mtp.BlobDBID + // - For mediaPropertyQuery: mp.TypeTagDBID, mp.Text, mp.BlobDBID + // ContentType always comes from mb (MediaBlobs). parts = append(parts, "tt.Type || ':' || t.Tag", - "TypeTagDBID", - "Text", - "BlobDBID", - "ContentType", + "%s.TypeTagDBID", + "%s.Text", + "%s.BlobDBID", + "mb.ContentType", "CASE WHEN mb.Data IS NOT NULL THEN length(mb.Data) ELSE NULL END", dataColumn, ) return strings.Join(parts, ", ") }This would require updating the function signature to accept a table alias parameter, or splitting into table-specific helpers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_scraper.go` around lines 1247 - 1266, The propertySelectColumns function uses unqualified column names (TypeTagDBID, Text, BlobDBID, ContentType) which should be qualified to avoid future collisions; update propertySelectColumns to accept a table-alias parameter (or separate helpers) and change those entries to use the correct aliases (e.g., mtp.TypeTagDBID, mtp.Text or mb.BlobDBID, mb.ContentType as appropriate to how joins are built alongside existing mb.Data and tt/t.Tag usage) so all columns are explicitly prefixed with their table alias.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/database/mediadb/sql_scraper_test.go`:
- Around line 930-956: Add a companion test to cover the inclusive boundary at
database.MaxMediaPropertyBinaryBytes: create a blob via mediaDB.UpsertMediaBlob,
set its Data to zeroblob(?) with size == database.MaxMediaPropertyBinaryBytes,
UpsertMediaProperties referencing that BlobDBID (and a non-empty ContentType if
applicable), then call mediaDB.GetMediaProperties and assert the property still
omits Binary, has BlobSize == database.MaxMediaPropertyBinaryBytes, keeps
BlobDBID, and also asserts ContentType (or the corresponding metadata field
returned by GetMediaProperties) is present and equals the expected value; mirror
the structure and helpers used in
TestGetMediaProperties_OversizedBlobOmitsBinary to locate where to add this new
test.
In `@pkg/database/mediadb/sql_scraper.go`:
- Around line 1247-1266: The propertySelectColumns function uses unqualified
column names (TypeTagDBID, Text, BlobDBID, ContentType) which should be
qualified to avoid future collisions; update propertySelectColumns to accept a
table-alias parameter (or separate helpers) and change those entries to use the
correct aliases (e.g., mtp.TypeTagDBID, mtp.Text or mb.BlobDBID, mb.ContentType
as appropriate to how joins are built alongside existing mb.Data and tt/t.Tag
usage) so all columns are explicitly prefixed with their table alias.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 669c69f2-0fd1-4ab1-a890-a7c87567d4ca
📒 Files selected for processing (6)
pkg/api/methods/media_image.gopkg/api/methods/media_image_test.gopkg/api/methods/media_meta_test.gopkg/database/database.gopkg/database/mediadb/sql_scraper.gopkg/database/mediadb/sql_scraper_test.go
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/api/methods/media_image.go (1)
286-338: 💤 Low valueConsider extracting property binary resolution into a helper to reduce fallback path duplication.
The oversized-blob check (
propertyBlobTooLarge) and disk-read path (readMediaBinaryFile) appear twice: once in the main flow (lines 286-294) and again in the media→title fallback (lines 319-327). This duplication increases maintenance burden if validation logic changes.A helper like
resolvePropertyBinary(prop, maxBytes) ([]byte, error)could encapsulate both blob-size validation and file reading, simplifying both paths and the stale-property cleanup branches.Given the bounded scope and explicit control flow benefits, this is optional.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/api/methods/media_image.go` around lines 286 - 338, The duplicate logic that checks propertyBlobTooLarge and reads disk via readMediaBinaryFile appears in two places while resolving prop.Binary (in the media path and the media→title fallback); extract this into a helper like resolvePropertyBinary(prop, maxBytes) ([]byte, error) and replace both duplicated blocks with calls to it, keeping existing callers that handle os.ErrNotExist by returning the same mediaImageReadError or performing stale-property cleanup (db.DeleteMediaTitleProperty, db.DeleteMediaProperty) and map updates (titleMap, mediaMap) as before; ensure resolvePropertyBinary performs the propertyBlobTooLarge check and attempts readMediaBinaryFile(prop.Text, maxBytes) when prop.Binary is empty, returning the binary or a wrapped error suitable for the existing error handling in the surrounding function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/api/methods/media_image.go`:
- Around line 286-338: The duplicate logic that checks propertyBlobTooLarge and
reads disk via readMediaBinaryFile appears in two places while resolving
prop.Binary (in the media path and the media→title fallback); extract this into
a helper like resolvePropertyBinary(prop, maxBytes) ([]byte, error) and replace
both duplicated blocks with calls to it, keeping existing callers that handle
os.ErrNotExist by returning the same mediaImageReadError or performing
stale-property cleanup (db.DeleteMediaTitleProperty, db.DeleteMediaProperty) and
map updates (titleMap, mediaMap) as before; ensure resolvePropertyBinary
performs the propertyBlobTooLarge check and attempts
readMediaBinaryFile(prop.Text, maxBytes) when prop.Binary is empty, returning
the binary or a wrapped error suitable for the existing error handling in the
surrounding function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 02991861-ca08-46ae-9ac6-5bdac75bff45
📒 Files selected for processing (4)
pkg/api/methods/media_image.gopkg/database/database.gopkg/database/mediadb/sql_scraper.gopkg/database/mediadb/sql_scraper_test.go
Summary
Validation
Summary by CodeRabbit