fix(scraper): tighten gamelist metadata scraping#858
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR refactors the gamelist.xml scraper to use normalized path-folded ROM matching, adds ROM-level region/lang MediaTags, rewrites companion parent/child processing to atomic ApplyScrapeResult writes with companion stats, and introduces content-type inference for path-backed media properties; tests and documentation updated accordingly. ChangesContent Type Inference for Media Properties
Gamelist XML Scraper Path-Based Refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/database/scraper/gamelistxml/scraper_test.go (1)
1381-1389: ⚡ Quick winStrengthen
companionWriteMatcherto validate parent title metadata too.The matcher currently ignores
TitleTagsandTitleProps, so atomic-write regressions on companion parent metadata would still pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/scraper/gamelistxml/scraper_test.go` around lines 1381 - 1389, The matcher companionWriteMatcher currently only checks Sentinel and MediaTags; modify it to also validate the parent title metadata on the captured *database.ScrapeWrite*. Change the signature of companionWriteMatcher to accept expected title metadata (e.g., expectedTitleTags []database.TagInfo and expectedTitleProps map[string]string), and inside the mock.MatchedBy closure verify that w.TitleTags equals expectedTitleTags and w.TitleProps equals expectedTitleProps (use assert.ObjectsAreEqual or reflect.DeepEqual for the comparisons). Ensure you still check w != nil and the existing Sentinel and MediaTags checks so the matcher rejects writes missing the parent title metadata.pkg/database/scraper/gamelistxml/scraper.go (1)
934-939: ⚡ Quick winMove
companionStatsto the top-level type/const section.This new type is declared mid-file after many functions. Please move it near other type/const declarations for guideline compliance.
As per coding guidelines,
**/*.go: "Define Go types and consts near the top of the file, before functions and methods".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/scraper/gamelistxml/scraper.go` around lines 934 - 939, Move the companionStats type declaration out of the middle of the file and place it with the other top-level types/consts near the top of the file (before any functions or methods); specifically locate the companionStats type and cut/paste it into the file’s existing type/const block so it sits alongside other package-level type definitions, ensure no references need adjusting, then run gofmt/govet to verify formatting and imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/methods/media_content.go`:
- Around line 49-52: The function mediaContentType currently checks
strings.TrimSpace(contentType) but returns the original untrimmed contentType;
update mediaContentType to return the trimmed value (use
strings.TrimSpace(contentType)) when the check passes, and similarly ensure any
fallback return (the text parameter) is also strings.TrimSpace(text) so no
whitespace-padded MIME values are exposed.
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 1223-1231: The companionChildTags function currently appends raw
Region and Lang values which can create inconsistent tags (e.g., "USA, EUR");
update companionChildTags to normalize these fields the same way MapToDB does:
split CSV values, trim whitespace, lowercase (or apply the same normalization
function MapToDB uses), and append each resulting token as a separate
database.TagInfo with Type string(tags.TagTypeRegion) or
string(tags.TagTypeLang) respectively; locate companionChildTags and reuse or
mirror the normalization logic used elsewhere (e.g., the MapToDB helper or
tokenization utility) so region/lang produce consistent individual tags.
---
Nitpick comments:
In `@pkg/database/scraper/gamelistxml/scraper_test.go`:
- Around line 1381-1389: The matcher companionWriteMatcher currently only checks
Sentinel and MediaTags; modify it to also validate the parent title metadata on
the captured *database.ScrapeWrite*. Change the signature of
companionWriteMatcher to accept expected title metadata (e.g., expectedTitleTags
[]database.TagInfo and expectedTitleProps map[string]string), and inside the
mock.MatchedBy closure verify that w.TitleTags equals expectedTitleTags and
w.TitleProps equals expectedTitleProps (use assert.ObjectsAreEqual or
reflect.DeepEqual for the comparisons). Ensure you still check w != nil and the
existing Sentinel and MediaTags checks so the matcher rejects writes missing the
parent title metadata.
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 934-939: Move the companionStats type declaration out of the
middle of the file and place it with the other top-level types/consts near the
top of the file (before any functions or methods); specifically locate the
companionStats type and cut/paste it into the file’s existing type/const block
so it sits alongside other package-level type definitions, ensure no references
need adjusting, then run gofmt/govet to verify formatting and imports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 024e2c7a-f4ed-475e-ab56-937c7ab997f2
📒 Files selected for processing (8)
docs/scraper.mdpkg/api/methods/media_content.gopkg/api/methods/media_image.gopkg/api/methods/media_image_test.gopkg/api/methods/media_meta.gopkg/api/methods/media_meta_test.gopkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.go
| func mediaContentType(contentType, text string) string { | ||
| if strings.TrimSpace(contentType) != "" { | ||
| return contentType | ||
| } |
There was a problem hiding this comment.
Trim the returned content type before exposing it.
Line 50 checks a trimmed value, but Line 51 returns the untrimmed string. This can leak whitespace-padded MIME values into responses.
Suggested fix
func mediaContentType(contentType, text string) string {
- if strings.TrimSpace(contentType) != "" {
- return contentType
+ if ct := strings.TrimSpace(contentType); ct != "" {
+ return ct
}
ext := extensionFromTextPath(text)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func mediaContentType(contentType, text string) string { | |
| if strings.TrimSpace(contentType) != "" { | |
| return contentType | |
| } | |
| func mediaContentType(contentType, text string) string { | |
| if ct := strings.TrimSpace(contentType); ct != "" { | |
| return ct | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/api/methods/media_content.go` around lines 49 - 52, The function
mediaContentType currently checks strings.TrimSpace(contentType) but returns the
original untrimmed contentType; update mediaContentType to return the trimmed
value (use strings.TrimSpace(contentType)) when the check passes, and similarly
ensure any fallback return (the text parameter) is also strings.TrimSpace(text)
so no whitespace-padded MIME values are exposed.
| func companionChildTags(c companionChild) []database.TagInfo { | ||
| var childTags []database.TagInfo | ||
| if c.Region != "" { | ||
| childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeRegion), Tag: c.Region}) | ||
| } | ||
| if c.Lang != "" { | ||
| childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeLang), Tag: c.Lang}) | ||
| } | ||
| return childTags |
There was a problem hiding this comment.
Normalize companion child region/lang tags the same way as normal records.
Line 1226 and Line 1229 write raw values (no CSV split / lowercase), which can produce inconsistent tags versus MapToDB behavior (e.g., "USA, EUR" as one tag).
💡 Suggested fix
func companionChildTags(c companionChild) []database.TagInfo {
var childTags []database.TagInfo
- if c.Region != "" {
- childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeRegion), Tag: c.Region})
- }
- if c.Lang != "" {
- childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeLang), Tag: c.Lang})
- }
+ childTags = appendCSVTags(childTags, string(tags.TagTypeRegion), c.Region)
+ childTags = appendCSVTags(childTags, string(tags.TagTypeLang), c.Lang)
return childTags
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func companionChildTags(c companionChild) []database.TagInfo { | |
| var childTags []database.TagInfo | |
| if c.Region != "" { | |
| childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeRegion), Tag: c.Region}) | |
| } | |
| if c.Lang != "" { | |
| childTags = append(childTags, database.TagInfo{Type: string(tags.TagTypeLang), Tag: c.Lang}) | |
| } | |
| return childTags | |
| func companionChildTags(c companionChild) []database.TagInfo { | |
| var childTags []database.TagInfo | |
| childTags = appendCSVTags(childTags, string(tags.TagTypeRegion), c.Region) | |
| childTags = appendCSVTags(childTags, string(tags.TagTypeLang), c.Lang) | |
| return childTags | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/database/scraper/gamelistxml/scraper.go` around lines 1223 - 1231, The
companionChildTags function currently appends raw Region and Lang values which
can create inconsistent tags (e.g., "USA, EUR"); update companionChildTags to
normalize these fields the same way MapToDB does: split CSV values, trim
whitespace, lowercase (or apply the same normalization function MapToDB uses),
and append each resulting token as a separate database.TagInfo with Type
string(tags.TagTypeRegion) or string(tags.TagTypeLang) respectively; locate
companionChildTags and reuse or mirror the normalization logic used elsewhere
(e.g., the MapToDB helper or tokenization utility) so region/lang produce
consistent individual tags.
|
Added core-only regression coverage for frontend issue ZaparooProject/zaparoo-frontend#161:
Validation rerun:
|
|
Actionable comments posted: 0 |
|
Actionable comments posted: 0 |
|
Added a follow-up fix for #794: regular gamelist.xml matching still uses concrete media paths, but now handles MiSTer zip-as-dir indexing by falling back from an unmatched .zip entry to exactly one indexed child path under that archive. Ambiguous multi-child archives are skipped rather than falling back to slug matching. |
|
Actionable comments posted: 0 |
|
Added a follow-up for #808. Coverage now distinguishes nested explicit artwork (already supported), nested filesystem fallback artwork, ignored nested gamelist.xml files, and thumbnail-only artwork. Fixes include mirrored subfolder media fallback (e.g. media/images/Japan/Game.png), box2dfront aliases, and media.image thumbnail support/default fallback. |
Summary
Validation
Summary by CodeRabbit
Documentation
New Features
Bug Fixes / Improvements
Tests