Skip to content

A more complete fix for the non-english language false positives #59

Open
JoeyBelvar wants to merge 2 commits into
Sebane1:masterfrom
JoeyBelvar:codex/fix-language-y-marker
Open

A more complete fix for the non-english language false positives #59
JoeyBelvar wants to merge 2 commits into
Sebane1:masterfrom
JoeyBelvar:codex/fix-language-y-marker

Conversation

@JoeyBelvar
Copy link
Copy Markdown
Contributor

@JoeyBelvar JoeyBelvar commented May 15, 2026

Intent: reduce false positives in the non-English dialogue filter while keeping the detection logic readable and easy to tune.

Changes:

  • Matches weak non-English markers as bounded words/phrases instead of raw substring matches.
  • Avoids regex by using an explicit boundary helper for easier review and troubleshooting.
  • Treats apostrophes and hyphens as word connectors, so names/stutters like Y'shtola, Y’all, and Y-You do not trip the standalone y marker.
  • Splits Spanish weak markers into:
    • specific words/phrases that can reject by themselves
    • generic short/function words that require multiple distinct hits
  • Removes redundant case variants and accented weak markers already covered by the strong accented-character check.
  • Keeps rejection logging descriptive by reporting whether the text matched a strong marker, specific weak marker, or multiple generic weak markers.

Validation:

  • Spot checks confirmed bundled nameless.json dialogue is no longer rejected by these weak markers.

@JoeyBelvar JoeyBelvar force-pushed the codex/fix-language-y-marker branch from 102608d to 5281ae2 Compare May 16, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant