Skip to content

feat(scraper): ZaparooCompanion xml support#839

Merged
wizzomafizzo merged 11 commits into
mainfrom
zaparoocompanion-xml-support
May 25, 2026
Merged

feat(scraper): ZaparooCompanion xml support#839
wizzomafizzo merged 11 commits into
mainfrom
zaparoocompanion-xml-support

Conversation

@BossRighteous
Copy link
Copy Markdown
Contributor

@BossRighteous BossRighteous commented May 24, 2026

Summary

Note:
source="ZaparooCompanion" is just a play on words since I jammed with @Anime0t4ku (of MiSTer Companion) on the format expectations and testing. It's not a specific vendor lock-in and any scraper/utility is free to use the source value and format concision.

Adds ZaparooCompanion XML scraping support to the gamelist.xml scraper and expands the boxart property model to distinguish 2D front, 3D, side, and back artwork.

Also adds an API method to cast a submitted path string to a MediaTitle with name/slugs/etc. This is useful to me, and I presume others in the future who understand and want to optimize for MediaTitle slugs as a parent in scraping.

Path ignorance

Titles are mapped by filename regardless of path, so any routine targeting MediaTitle enrichment can ignore paths.
This applies to scraper ingest, able to dedupe unique filenames regardless of path. It also applies to scraper output, where all paths can be made ./game.ext as a flattened mapping.

A new MediaDBI method FindMediaBySystemAndPathSuffix handles the suffix-match lookup, with LIKE wildcard escaping for safety. This allows root paths to lookup any matching system Media in the DB on XML scan.

ZaparooCompanion parent/child enrichment

gamelist.xml files generated by ZaparooCompanion use a two-record schema: a parent entry (identified by source="ZaparooCompanion" + id attribute, no path) carries full game metadata, and one or more child entries (parentid attribute + path) link regional ROM releases to that parent.

<game id="18923" source="ZaparooCompanion">
  <name>ACME Animation Factory</name>
  <desc>ACME Animation Factory is essentially Mario Paint with a Looney Tunes license. The player can choose from 18 pre-set animations of their favorite cartoon characters and super impose them over a scene to create their very own cartoon. Much like Mario Paint, they can also color blank drawings, create their own music and even play games like Solitaire and Mix 'n' Match. Players can also save their own creations for later viewing.</desc>
  <releasedate>19941101T000000</releasedate>
  <developer>Probe Software</developer>
  <publisher>Sunsoft</publisher>
  <players>1-2</players>
  <screenshot>./media/screenshot/18923.png</screenshot>
  <boxart2d>./media/box2d/18923.png</boxart2d>
  <logo>./media/logo/18923.png</logo>
</game>
<game parentid="18923" source="ZaparooCompanion">
<path>./ACME Animation Factory (Europe).sfc</path>
  <region>us</region>
  <lang>en</lang>
</game>
<game parentid="18923" source="ZaparooCompanion">
  <path>./ACME Animation Factory (USA).sfc</path>
  <region>us</region>
  <lang>en</lang>
</game>

The scraper now processes these entries as a pre-pass before the standard slug-based scrape:

  • Phase 1 — parent records are mapped to tags and properties via MapToDB (no new DB rows created).
  • Phase 2 — each child's ROM path is resolved to its indexed Media row via FindMediaBySystemAndPathSuffix. The parent's tags and properties are upserted onto the child's existing MediaTitle. Per-child region and lang tags are written to the Media row.

.slug extension MediaTitle matcher

<game parentid="18923" source="ZaparooCompanion">
  <path>./acmeanimationfactory.slug</path>
</game>

A .slug extension in a companion style child entry will target the MediaTitle directly without proxy Media lookup. This is the ultimate generic targeting and may be used in conjunction with the media.titleFromPath method for best sharability between romsets and filenames.

Boxart property split

TagPropertyImageBoxart previously acted as a catch-all for all boxart variants. It is now scoped to 2D front artwork only. Three new tag property constants are introduced:

Constant Value Source
TagPropertyImageBoxart image-boxart <boxart2d> XML / boxart, boxart2d, boxart2dfront dirs
TagPropertyImageBoxart3D image-boxart3d <boxart3d> XML / boxart3d dir
TagPropertyImageBoxartSide image-boxartside boxart2dside dir (filesystem only)
TagPropertyImageBoxartBack image-boxartback boxart2dback dir (filesystem only)

boxart3d is added to the media.image API default preference order between boxart and screenshot. All four types are exposed in imageTypeTags for explicit API requests.

esapi.Game additions

New XML fields decoded from gamelist.xml:

  • source attribute → SourceAttr (companion source detection)
  • parentid attribute → ParentIDAttr (child→parent link)
  • screenshot, titlescreen, boxart2d, boxart3d, logo elements, custom but clear intent as mapped to tags.

MapToDB is updated to use these fields: Logo falls back to Wheel for the wheel property; TitleScreen falls back to TitleShot for the titleshot property.

New API endpoint

media.titleFromPath — computes a MediaTitle slug and name from a system ID and path without touching the filesystem or database. Used by ZaparooCompanion to preview how the scanner will interpret a ROM path before indexing.

Testing Steps

I can provide a small sample SNES system with complex conditions used in both the Companion testing as well as the Core testing. Tested against nested and repeated rom in hierarchy.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a new API endpoint to parse media titles and extract metadata from file paths
    • Extended image support to include 3D box art, side box art, and back box art variants
    • Enhanced gamelist.xml parsing to recognize and process companion entries for improved metadata enrichment

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR extends the gamelist.xml scraper to support ZaparooCompanion parent/child entries and new boxart image variants (3D, side, back). It adds database methods for media lookup by path suffix and title by slug, introduces a JSON-RPC API endpoint to derive slug/title metadata from file paths, and updates image property mapping to prefer XML fields over filesystem scanning.

Changes

Companion Scraper, Image Variants, and Title-from-Path API

Layer / File(s) Summary
Tag constants and Game XML model for boxart variants
pkg/database/tags/tag_values.go, pkg/database/tags/tags.go, pkg/platforms/shared/esapi/gamelist.go
New exported property tag constants TagPropertyImageBoxart3D, TagPropertyImageBoxartSide, and TagPropertyImageBoxartBack join their canonical definitions. Game struct adds Logo, Boxart2D, Boxart3D, Screenshot, TitleScreen, SourceAttr, and ParentIDAttr fields to support XML-based metadata and companion entry attributes.
Media path-suffix and title-slug database lookup
pkg/database/database.go, pkg/database/mediadb/sql_scraper.go, pkg/database/mediadb/sql_scraper_test.go, pkg/testing/helpers/db_mocks.go
MediaDBI interface gains FindMediaBySystemAndPathSuffix (suffix-based LIKE matching with SQL wildcard escaping) and FindMediaTitleBySystemAndSlug (equality lookup with fallback to nil). SQL implementations include proper context handling, row scanning, and error propagation; test suite covers suffix matching correctness, wildcard escaping, and system isolation.
Scraper: companion entries, image properties, and integration
pkg/database/scraper/gamelistxml/scraper.go
mediaDirCandidates is split per boxart variant. MapToDB string cleaning extends to additional XML fields (Screenshot, TitleScreen, Boxart2D, Boxart3D, Logo). Image property mapping now prefers XML Boxart2D/Boxart3D/Screenshot with filesystem fallback only when empty; Wheel/titleshot prefer Logo/TitleScreen with fallback to legacy Wheel/TitleShot. Full ZaparooCompanion support added: loadCompanionEntries scans gamelist.xml and separates parent (no path) from child (with path) entries; processCompanionEntries upserts parent tags/properties onto matched MediaTitle (by slug for .slug paths, by filename suffix otherwise) with optional region/lang upsert at Media level.
Scraper unit and integration tests
pkg/database/scraper/gamelistxml/scraper_test.go
Comprehensive test additions covering MapToDB boxart image handling (XML vs filesystem fallback), path resolution edge cases, MIME type mappings, image property precedence (GameFamily, Manual, wheel/logo, title screen vs shot), companion entry parsing/validation with cancellation and error handling, companion processing with dedup and skip conditions, and scrape-loop behavior in normal/force modes with mock DB interactions.
Media title-from-path JSON-RPC API contract and models
pkg/api/models/models.go, pkg/api/models/params.go, pkg/api/models/responses.go, pkg/api/server.go
New constant MethodMediaTitleParse = "media.title.parse" defined; request params require systemId and path (min length 1 each); response includes required slug, name, slugLength, slugWordCount and optional secondarySlug. Server method map registers the new endpoint.
Media title-from-path handler and tests
pkg/api/methods/media_title_parse.go, pkg/api/methods/media_title_parse_test.go
HandleMediaTitleParse validates params, resolves media type from system (fallback to MediaTypeGame), computes path fragments via mediascanner.GetPathFragments, generates slug metadata via mediadb.GenerateSlugMetadataFromTokens, and returns MediaTitleParseResponse. Test suite verifies validation errors, successful parsing with unicode rune counting, and secondary-slug derivation for titles with colons or dashes.
Media image type preferences
pkg/api/methods/media_image.go
defaultImageTypes includes "boxart3d" fallback; imageTypeTags map "boxart3d", "boxartside", "boxartback" to their property tag constants.
Non-functional database insert formatting
pkg/database/mediadb/mediadb.go, pkg/database/mediadb/sql_media_titles.go
Batch insert and SQL Exec argument lists reformatted across multiple lines for readability; no behavioral changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • ZaparooProject/zaparoo-core#770: Overlaps in pkg/api/methods/media_image.go image-type preference logic; main PR extends defaultImageTypes/imageTypeTags for boxart variants while retrieved PR refactors HandleMediaImage to use preferences with fallback.
  • ZaparooProject/zaparoo-core#740: Main PR's MapToDB artwork/image mapping updates (new boxart3d/side/back handling and candidate-directory resolution) are directly incremental to the gamelist.xml scraper implementation in the retrieved PR.
  • ZaparooProject/zaparoo-core#789: Both PRs modify gamelistxml scraper's MapToDB function in pkg/database/scraper/gamelistxml/scraper.go, overlapping at the same code path.

Poem

🐰 New boxes in three dimensions bright,
Companions join the scrape tonight.
Paths parse to slugs with XML delight,
A season of variants takes flight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat(scraper): ZaparooCompanion xml support' clearly and concisely summarizes the main change: adding ZaparooCompanion XML support to the gamelist scraper. The title accurately reflects the primary functionality introduced in this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zaparoocompanion-xml-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BossRighteous BossRighteous changed the title Zaparoocompanion xml support feat(scraper): ZaparooCompanion xml support May 24, 2026
@sentry
Copy link
Copy Markdown

sentry Bot commented May 24, 2026

Codecov Report

❌ Patch coverage is 83.09859% with 48 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/database/scraper/gamelistxml/scraper.go 84.00% 25 Missing and 7 partials ⚠️
pkg/database/mediadb/sql_scraper.go 68.00% 8 Missing and 8 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/models/responses.go`:
- Line 456: The SecondarySlug field is a nullable *string but its JSON tag lacks
omitempty so it is emitted as null; update the struct field declaration for
SecondarySlug to include the omitempty directive (change the json tag for
SecondarySlug to "secondarySlug,omitempty") so the key is omitted when unset,
ensuring behavior matches other optional response fields.

In `@pkg/database/scraper/gamelistxml/scraper_test.go`:
- Around line 920-1067: Add unit tests exercising the companion parent/child
ingestion/enrichment branches of GamelistXMLScraper.MapToDB: create a temp
SystemRootPath, construct GamelistRecord entries representing a parent (with
companion metadata and media files in AvailableMediaDirs) and a child (with a
companion reference linking to the parent), invoke
(&GamelistXMLScraper{}).MapToDB(&rec) and assert that the result contains the
parsed parent data, the child is matched to the parent (check TitleProps for
relation/parent tags), and media/title upserts are present for both (verify
expected media property keys from tags.TagProperty* and TitleProps values use
filepath.ToSlash on created files). Ensure tests cover both enrichment-from-file
(explicit Game.Boxart*/paths) and filesystem fallback using AvailableMediaDirs.

In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 1093-1095: The code is using only filepath.Base(c.ResolvedPath)
when calling mdb.FindMediaBySystemAndPathSuffix which can match unrelated files
with the same filename; change this to compute and use a path suffix relative to
the system root instead (e.g., use filepath.Rel(systemRoot, c.ResolvedPath) and
normalize with filepath.ToSlash) and pass that relative suffix to
mdb.FindMediaBySystemAndPathSuffix, falling back to the basename only if Rel
fails; apply the same change to the other occurrences around the block that
currently use filepath.Base (the call sites involving c.ResolvedPath and
mdb.FindMediaBySystemAndPathSuffix).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 915994b1-b73c-4262-9b4a-3e89a4974359

📥 Commits

Reviewing files that changed from the base of the PR and between 5d2b246 and c6acc70.

📒 Files selected for processing (16)
  • pkg/api/methods/media_image.go
  • pkg/api/methods/media_title_from_path.go
  • pkg/api/models/models.go
  • pkg/api/models/params.go
  • pkg/api/models/responses.go
  • pkg/api/server.go
  • pkg/database/database.go
  • pkg/database/mediadb/mediadb.go
  • pkg/database/mediadb/sql_media_titles.go
  • pkg/database/mediadb/sql_scraper.go
  • pkg/database/scraper/gamelistxml/scraper.go
  • pkg/database/scraper/gamelistxml/scraper_test.go
  • pkg/database/tags/tag_values.go
  • pkg/database/tags/tags.go
  • pkg/platforms/shared/esapi/gamelist.go
  • pkg/testing/helpers/db_mocks.go

Comment thread pkg/api/models/responses.go Outdated
Comment thread pkg/database/scraper/gamelistxml/scraper_test.go
Comment thread pkg/database/scraper/gamelistxml/scraper.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/methods/media_title_from_path_test.go`:
- Line 59: Replace the hardcoded POSIX paths in the test params (the []byte JSON
literal that contains "/roms/nes/game.nes") with a platform-correct path built
via filepath.Join and then injected into the JSON for the request; for example,
construct path := filepath.Join("roms", "nes", "game.nes") (and the other
occurrence on line 63), then produce the params payload using json.Marshal or
fmt.Sprintf to include that path string instead of the literal so tests run
cross-platform in media_title_from_path_test.go.

In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 1108-1121: The code currently sets seenTitles[title.DBID] = true
even if UpsertMediaTitleTags or UpsertMediaTitleProperties failed; change the
logic in scraper.go so that seenTitles is only marked true after both required
upserts succeed (treat absent meta.TitleTags or meta.TitleProps as a no-op
success), i.e. track success for the tag upsert and the props upsert (call to
mdb.UpsertMediaTitleTags and mdb.UpsertMediaTitleProperties) and only set
seenTitles[title.DBID] = true when both operations that should run completed
without error; if either returns an error, do not mark seenTitles so later
children can retry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 917cbd9e-1e90-492b-937f-82bb1839c9c4

📥 Commits

Reviewing files that changed from the base of the PR and between c6acc70 and 92c44a0.

📒 Files selected for processing (8)
  • pkg/api/methods/media_title_from_path_test.go
  • pkg/api/models/responses.go
  • pkg/database/database.go
  • pkg/database/mediadb/sql_scraper.go
  • pkg/database/mediadb/sql_scraper_test.go
  • pkg/database/scraper/gamelistxml/scraper.go
  • pkg/database/scraper/gamelistxml/scraper_test.go
  • pkg/testing/helpers/db_mocks.go

Comment thread pkg/api/methods/media_title_from_path_test.go Outdated
Comment thread pkg/database/scraper/gamelistxml/scraper.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/api/models/responses.go (1)

455-461: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Place MediaTitleParseResponse with top-level type declarations.

This new type is declared after method definitions; move it into the type block before functions/methods to align with file-level Go structure rules.

As per coding guidelines, "Define Go types and consts near the top of the file, before functions and methods".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/models/responses.go` around lines 455 - 461, MediaTitleParseResponse
is declared after functions; move this type declaration into the file's
top-level type declarations block (or create one if absent) so all types/consts
live before any functions/methods; locate the MediaTitleParseResponse type
symbol and cut/paste it into the existing top area where other type definitions
are declared (before functions/methods) to comply with Go file-structure
guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/models/params.go`:
- Around line 243-246: Move the MediaTitleParseParams type declaration so type
definitions appear before methods: locate MediaTitleParseParams and cut/paste it
above the ReaderConnection.IsEnabled method declaration (i.e., with other
top-level type/const defs near the top of the file). Ensure the struct tag and
field names remain unchanged and run go vet/go fmt to confirm formatting.

---

Outside diff comments:
In `@pkg/api/models/responses.go`:
- Around line 455-461: MediaTitleParseResponse is declared after functions; move
this type declaration into the file's top-level type declarations block (or
create one if absent) so all types/consts live before any functions/methods;
locate the MediaTitleParseResponse type symbol and cut/paste it into the
existing top area where other type definitions are declared (before
functions/methods) to comply with Go file-structure guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98f7ddf6-5c15-4491-866c-1baddd1bbd4f

📥 Commits

Reviewing files that changed from the base of the PR and between 1869f1e and 968f3f8.

📒 Files selected for processing (6)
  • pkg/api/methods/media_title_parse.go
  • pkg/api/methods/media_title_parse_test.go
  • pkg/api/models/models.go
  • pkg/api/models/params.go
  • pkg/api/models/responses.go
  • pkg/api/server.go

Comment thread pkg/api/models/params.go
Comment on lines +243 to +246
type MediaTitleParseParams struct {
SystemID string `json:"systemId" validate:"required,min=1"`
Path string `json:"path" validate:"required,min=1"`
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move MediaTitleParseParams above method declarations.

MediaTitleParseParams is introduced after ReaderConnection.IsEnabled (Line 112). Please keep new type declarations before functions/methods in this file.

As per coding guidelines, "Define Go types and consts near the top of the file, before functions and methods".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/models/params.go` around lines 243 - 246, Move the
MediaTitleParseParams type declaration so type definitions appear before
methods: locate MediaTitleParseParams and cut/paste it above the
ReaderConnection.IsEnabled method declaration (i.e., with other top-level
type/const defs near the top of the file). Ensure the struct tag and field names
remain unchanged and run go vet/go fmt to confirm formatting.

@wizzomafizzo wizzomafizzo merged commit 53c66bb into main May 25, 2026
12 checks passed
@wizzomafizzo wizzomafizzo deleted the zaparoocompanion-xml-support branch May 25, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants