feat(taxonomy): add opt-in normalized-name matching to resolve-term#2387
Merged
Conversation
ResolveTermAbility (the platform's single source of truth for term resolution) only matched by exact name then slug, so case/punctuation/article variants created duplicate terms — "Tyler, the Creator" vs "Tyler the Creator", "Beyoncé" vs "Beyonce", "AC/DC" vs "ACDC". Robust normalized matching existed only inside data-machine-events' Venue_Taxonomy, so only venues benefited. Promote that generic algorithm into core: add an opt-in `fuzzy` flag that inserts a normalized-name matching layer between slug-match and create. The normalizer (decode entities, strip accents, lowercase, ampersand→and, strip leading article, strip non-alphanumerics, collapse whitespace) is a pure, taxonomy-agnostic string function, so every taxonomy — artist, venue, location, festival — gets fuzzy dedup for free. Default false keeps all existing callers' behavior unchanged. Exposes ResolveTermAbility::normalize_name_for_matching() publicly so consumers (e.g. Venue_Taxonomy delegation) share one canonical normalizer.
6 tasks
Contributor
Homeboy Results —
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes generic normalized term-matching into
ResolveTermAbility(the platform's single source of truth for term resolution), so case/punctuation/article/accent variants stop creating duplicate terms — for every taxonomy.Why
ResolveTermAbilitymatched only by exact name → slug, so "Tyler, the Creator" vs "Tyler the Creator", "Beyoncé" vs "Beyonce", "AC/DC" vs "ACDC" all created duplicate terms. Robust normalized matching existed only inside data-machine-events'Venue_Taxonomy— so only venues benefited, and artists/locations/festivals were stuck with the weak resolver.This is the root-cause fix for the artist-dup gap (Extra-Chill/extrachill-events#144): the matcher belongs in core, not forked per-taxonomy or copied across a layer boundary.
What changed
fuzzyinput (default false) inserts a normalized-name matching layer between slug-match and create. Existing callers are unaffected (default off).ResolveTermAbility::normalize_name_for_matching()made public — a pure, taxonomy-agnostic normalizer: decode HTML entities → strip accents → lowercase → ampersand→"and" → strip leading article → strip non-alphanumerics → collapse whitespace. Min 3-char guard against false matches.resolve(..., bool $fuzzy = false)gains the flag.Consumers (separate PRs)
resolve-termwithfuzzy=true→ closes the artist-dup gap.Venue_Taxonomydelegates its normalizer to this one (drops its fork).Validation
homeboy lint→ passed, PHPStan passed.Conventional commit; no version/changelog hand-edits.