Skip to content

feat(taxonomy): add opt-in normalized-name matching to resolve-term#2387

Merged
chubes4 merged 1 commit into
mainfrom
resolve-term-normalized
May 31, 2026
Merged

feat(taxonomy): add opt-in normalized-name matching to resolve-term#2387
chubes4 merged 1 commit into
mainfrom
resolve-term-normalized

Conversation

@chubes4
Copy link
Copy Markdown
Member

@chubes4 chubes4 commented May 31, 2026

Summary

Promotes generic normalized term-matching into ResolveTermAbility (the platform's single source of truth for term resolution), so case/punctuation/article/accent variants stop creating duplicate terms — for every taxonomy.

Why

ResolveTermAbility matched only by exact name → slug, so "Tyler, the Creator" vs "Tyler the Creator", "Beyoncé" vs "Beyonce", "AC/DC" vs "ACDC" all created duplicate terms. Robust normalized matching existed only inside data-machine-events' Venue_Taxonomy — so only venues benefited, and artists/locations/festivals were stuck with the weak resolver.

This is the root-cause fix for the artist-dup gap (Extra-Chill/extrachill-events#144): the matcher belongs in core, not forked per-taxonomy or copied across a layer boundary.

What changed

  • New opt-in fuzzy input (default false) inserts a normalized-name matching layer between slug-match and create. Existing callers are unaffected (default off).
  • ResolveTermAbility::normalize_name_for_matching() made public — a pure, taxonomy-agnostic normalizer: decode HTML entities → strip accents → lowercase → ampersand→"and" → strip leading article → strip non-alphanumerics → collapse whitespace. Min 3-char guard against false matches.
  • Static resolve(..., bool $fuzzy = false) gains the flag.

Consumers (separate PRs)

  • Extra-Chill/extrachill-users concert-import resolves artists via resolve-term with fuzzy=true → closes the artist-dup gap.
  • Extra-Chill/data-machine-events Venue_Taxonomy delegates its normalizer to this one (drops its fork).

Validation

  • homeboy lint → passed, PHPStan passed.
  • Normalizer validated live against real variant pairs — all match: "Tyler, the Creator"/"Tyler the Creator", "Beyoncé"/"Beyonce", "AC/DC"/"ACDC", "The Rolling Stones"/"Rolling Stones", "Hook & Ladder"/"Hook and Ladder", "Sigur Rós"/"Sigur Ros".

Conventional commit; no version/changelog hand-edits.

ResolveTermAbility (the platform's single source of truth for term resolution)
only matched by exact name then slug, so case/punctuation/article variants
created duplicate terms — "Tyler, the Creator" vs "Tyler the Creator",
"Beyoncé" vs "Beyonce", "AC/DC" vs "ACDC". Robust normalized matching existed
only inside data-machine-events' Venue_Taxonomy, so only venues benefited.

Promote that generic algorithm into core: add an opt-in `fuzzy` flag that
inserts a normalized-name matching layer between slug-match and create. The
normalizer (decode entities, strip accents, lowercase, ampersand→and, strip
leading article, strip non-alphanumerics, collapse whitespace) is a pure,
taxonomy-agnostic string function, so every taxonomy — artist, venue,
location, festival — gets fuzzy dedup for free. Default false keeps all
existing callers' behavior unchanged.

Exposes ResolveTermAbility::normalize_name_for_matching() publicly so consumers
(e.g. Venue_Taxonomy delegation) share one canonical normalizer.
@homeboy-ci
Copy link
Copy Markdown
Contributor

homeboy-ci Bot commented May 31, 2026

Homeboy Results — data-machine

Lint

lint — passed

ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 1c4e4c8

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26700310529

Test

test — passed

  • 318 passed

ℹ️ Auto-fix lint issues: homeboy refactor data-machine --from lint --write
ℹ️ Collect coverage: homeboy test data-machine --coverage
ℹ️ Save test baseline: homeboy test data-machine --baseline
ℹ️ Pass args to test runner: homeboy test -- [args]
ℹ️ Full options: homeboy docs commands/test
Deep dive: homeboy test data-machine --changed-since 1c4e4c8

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26700310529

Audit

audit — passed

  • audit — 22 finding(s)
  • Total: 22 finding(s)

Deep dive: homeboy audit data-machine --changed-since 1c4e4c8

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/26700310529
Tooling versions
  • Homeboy CLI: homeboy 0.213.1+99b2c7ed
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: 05a1724a
  • Action: unknown@unknown

@chubes4 chubes4 merged commit 6e13916 into main May 31, 2026
5 checks passed
@chubes4 chubes4 deleted the resolve-term-normalized branch May 31, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant