Skip to content

lastgenre: Genre ignorelist#6449

Open
JOJ0 wants to merge 5 commits intomasterfrom
lastgenre_forbidden
Open

lastgenre: Genre ignorelist#6449
JOJ0 wants to merge 5 commits intomasterfrom
lastgenre_forbidden

Conversation

@JOJ0
Copy link
Copy Markdown
Member

@JOJ0 JOJ0 commented Mar 19, 2026

Description

Adds a global and artist-specific genre ignorelist to lastgenre. Ignorelist entries can use regex patterns or literal genre names and are configurable per artist or globally (*). Example:

FIXME config
FIXME description

Further notes

My personal roadmap for the lastgenre plugin: #5915

Feature idea originated from: #5721 (comment).

To Do

  • Documentation.
  • Changelog.
  • Tests.

- Test file format (valid and error cases)
- Test regex pattern matching (_is_ignored)
- Test _resolve_genres: ignored genres filtered
- Test _resolve_genres: c14n ancestry walk blocked for ignored tags
- Test _resolve_genres: without whitelist, oldest ancestor is kept
- Test _resolve_genres: whithout whitelist, ignored oldest ancestor is
  removed
@JOJ0 JOJ0 requested a review from a team as a code owner March 19, 2026 22:03
Copilot AI review requested due to automatic review settings March 19, 2026 22:03
@github-actions
Copy link
Copy Markdown

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

grug see PR add genre ignorelist to lastgenre plugin, so wrong Last.fm tags can be blocked global (*) or per artist. this fit plugin job: pick good genre tags and keep user control.

Changes:

  • add ignorelist file parsing + regex/literal compile, plus shared is_ignored() helper
  • apply ignorelist both right after Last.fm fetch and again during genre resolve/whitelist/c14n
  • add docs + tests for ignorelist matching and file format/errors

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
beetsplug/lastgenre/__init__.py load ignorelist from config file; thread artist through resolve path; filter ignored tags during c14n + final validation
beetsplug/lastgenre/client.py filter ignored tags immediately after Last.fm lookup (while keeping cache)
beetsplug/lastgenre/utils.py new shared is_ignored() helper + ignorelist typing
docs/plugins/lastgenre.rst document ignorelist feature + config option
test/plugins/test_lastgenre.py add ignorelist tests (resolve behavior, matcher behavior, file format + errors)
Comments suppressed due to low confidence (1)

test/plugins/test_lastgenre.py:293

  • grug see this config fixture just return upstream module-scoped config. it not reset state per test, so config['lastgenre']['ignorelist'] (and other keys) can leak between tests and even break LastGenrePlugin() init when prior test set ignorelist path then delete file. grug suggest make this fixture actually reset needed keys (ex: set lastgenre.ignorelist = False here) or use function-scoped fresh config object.
@pytest.fixture
def config(config):
    """Provide a fresh beets configuration for every test/parameterize call

    This is necessary to prevent the following parameterized test to bleed
    config test state in between test cases.
    """
    return config

Comment on lines 544 to 548
elif obj.albumartist != config["va_name"].as_str():
new_genres = self.client.fetch_artist_genre(obj.albumartist)
stage_label = "album artist"
stage_artist = obj.albumartist
if not new_genres:
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grug see multi-valued album artist fallback collect genres from many obj.albumartists, but stage_artist stay set to obj.albumartist. then later resolve step apply artist-specific ignorelist for wrong artist and can drop genre that came from other albumartist. grug think when stage_label become "multi-valued album artist", set stage_artist = None (global only) or handle per-artist filtering another way.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix that

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed and added a test for it

Comment on lines +186 to +188
fracture:
^(heavy|black|power|death)?\s?(metal|rock)$|\w+-metal\d*$
progressive metal
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grug think regex example in code block wrong: it show double backslash (\\s, \\w, \\d). in ignorelist file user must write single backslash (\s, \w, \d) or regex will match literal "\s" text, not whitespace/word/digit. please change example to single backslashes so copy/paste work.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix that

JOJ0 added 2 commits March 19, 2026 23:16
- Prevents wrong last.fm genres based on a per artist (or global) list of regex
  patterns that should be ignored.
  - Genre _ignoring_ happens in two places: Right after fetching the
    last.fm genre and in _resolve_genres.
  - As a fallback literal string matching can be used instead of
    supplying a regex pattern
- Includes docs for the new feature.
@JOJ0 JOJ0 force-pushed the lastgenre_forbidden branch from 1400171 to c17eebb Compare March 19, 2026 22:16
@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 19, 2026

@snejus this is now ready for a first (human) review except the copilot comments I will address. Thanks in advance :-)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.91%. Comparing base (1943b14) to head (c5a14d3).
⚠️ Report is 47 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/lastgenre/__init__.py 90.76% 3 Missing and 3 partials ⚠️
beetsplug/lastgenre/client.py 37.50% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6449      +/-   ##
==========================================
+ Coverage   69.73%   69.91%   +0.18%     
==========================================
  Files         145      146       +1     
  Lines       18534    18575      +41     
  Branches     3023     3027       +4     
==========================================
+ Hits        12924    12987      +63     
+ Misses       4976     4957      -19     
+ Partials      634      631       -3     
Files with missing lines Coverage Δ
beetsplug/lastgenre/utils.py 100.00% <100.00%> (ø)
beetsplug/lastgenre/client.py 55.10% <37.50%> (+1.91%) ⬆️
beetsplug/lastgenre/__init__.py 82.79% <90.76%> (+7.24%) ⬆️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@snejus
Copy link
Copy Markdown
Member

snejus commented Mar 21, 2026

Just had a quick look at the description and wanted to suggest that ideally we want to stay within the beets configuration YAML for the regex patterns. See my configuration: #6269 (comment), specifically importreplace plugin configuration and bandcamp.field_patterns which use regex patterns heavily.

What specific issue YAML poses in your case?

@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 21, 2026

Cant recall specifically, it was a year ago already. Loads of loads of excaping this and that. regexes bacame tedious to write and even read. Will take a look at how you implemented regex patterns within yaml.

What oposes not a separate format? whitelist and c14n is something else anyway.

I'd say it's simpler to use. Main reason. I gave up back then trying to use ini or yaml for regexes. It was simply painful to use 🤷

this new format I will use as-is for genre aliases too btw. also a reason why I think it's the best way forward for the project.

@snejus
Copy link
Copy Markdown
Member

snejus commented Mar 21, 2026

Cant recall specifically, it was a year ago already. Loads of loads of excaping this and that. regexes bacame tedious to write and even read. Will take a look at how you implemented regex patterns within yaml.

What oposes not a separate format? whitelist and c14n is something else anyway.

I'd say it's simpler to use. Main reason. I gave up back then trying to use ini or yaml for regexes. It was simply painful to use 🤷

this new format I will use as-is for genre aliases too btw. also a reason why I think it's the best way forward for the project.

Personally it's about the convenience - I want all my configuration to live in a single file. And if it's getting unwieldy, I can move, say, lastgenre configuration to a separate file.

For example, my importreplace configuration is huge, so it lives in a separate file:

directory: /mnt/music/Music
library: ~/.music/beets/library.db
include:
  - replacements.yaml

@snejus
Copy link
Copy Markdown
Member

snejus commented Mar 21, 2026

Also I don't see how a separate file improve the regex definitions. Note that yaml is extremely flexible in this regarding, one just needs to be aware of its features. For example, regexes do not need to be defined in a single line:

    artwork:
      pattern: |
        \b(?i:art(?:work)?|cover)\b
        [^|:,.\n-]*?
        (?:[-:]\ ?|by:?\ )
        (
            https:[^ ]+
          | (
            [\w@]
            (?:[^|:,.\n-]|[.]\w)+
            [^|:,.\n -]
          )+
        )
        ([|:,.\n \"-]|$)

Then when you read such configuration in python, make sure to specify re.VERBOSE flag: re.compile(config["regex"].get(), re.VERBOSE) to handle new lines.

@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 22, 2026

Please don't get me wrong: I understand that yaml is our config format and my first versions of this feature when I started about a year ago was of course trying to make it work with yaml.

And yes I am certainly aware of yaml includes and I do use them too :-)

Problems I had were mainly escaping issues, yaml treating regexes that work elsewhere in python do not work when put into yaml. I went to escaping hell and back. I also tried ini as an alternative and that was a little better but still painful.

At some point I decided that it is not worth ot and won't ever be userfriendly, so I put thought and effort into finding the most simple solution from a user perspective.

Open file - add regex from regex101 or self constructed - see if it compiles - and done. No special treatment so yaml accepts it.

That is user friendly and overrules "having everything in one file", which you say for yourself you don't do anyway when stuff gets too big. For last.fm genre ignores, believe me you will have a separate file because it gets big. So your argument is not too strong here.

Anyway I'm not at home/at computer these days and I won't argue further here, so if you insist that yaml is the better way to go, I will do it. It will take some time and it's a bit of a bummer that sets me back from finally getting this over with... If you help me with escaping my regexes for yaml I will be grateful

I see weird things in your config, what is this for example?

"\u2019"
"[\xE2\xA6\xF0\xCB]"
"Chan\xE9"
"Sz\xE1hala"
"Enk\u014D"
"BEHE\u0100DER"

@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 22, 2026

Also I don't see how a separate file improve the regex definitions. Note that yaml is extremely flexible in this regarding, one just needs to be aware of its features. For example, regexes do not need to be defined in a single line:

    artwork:

      pattern: |

        \b(?i:art(?:work)?|cover)\b

        [^|:,.\n-]*?

        (?:[-:]\ ?|by:?\ )

        (

            https:[^ ]+

          | (

            [\w@]

            (?:[^|:,.\n-]|[.]\w)+

            [^|:,.\n -]

          )+

        )

        ([|:,.\n \"-]|$)

Then when you read such configuration in python, make sure to specify re.VERBOSE flag: re.compile(config["regex"].get(), re.VERBOSE) to handle new lines.

multiline regexes dont solve the escaping problems, also I don't really need them for my usecase. What are you trying to say here?

@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 22, 2026

And just so we know what we are actually talking about, this format is really straight forward and a pleasure to read and edit. Not even any quoting required ❤️

alaska:
    ^ska$
    post-hardcore
alves:
    .*metal.*
    .*rock.*
bleak:
    Black Metal
    Dark Ambient
Bob Marley & The Wailers:
    ska
cleric:
    Grindcore
    Punk Rock
dj fresh:
    rhythm and blues
ed rush:
    rhythm and blues
fallen angels:
    Power Metal
    Heavy Metal
    Progressive Metal
    Alternative Rock
    Mathcore
    Metalcore
    Death Metal
    Rock
    grindcore
    death metal
    Thrash Metal
    Psychedelic Rock
    Hard Rock
fierce:
    rhythm and blues
fracture:
    progressive metal
    ^.*(heavy|black|power|death)?\s?(metal|rock)$|\w+-metal\d*$
    punk rock
    post-punk
    dubstep
    industrial
    electronica
heavy weather:
    screamo
    emo
    Stoner Metal
    Thrash Metal
herbie hancock:
    turntablism
    .*rock.*
moby:
    alternative rock
    ^ambient$
mr. scruff:
    ^jazz$
neptune:
    ^.*(heavy|black|power|death)?\s?(metal|rock)$|\w+-metal\d*$
nine:
    hip hop
    rap
paradox:
    Thrash Metal
    Speed Metal
    ^ska$
phobia:
    .*grind.*
    .*crust.*
    .*metal.*
    .*rock.*
placebo:
    .*alternative.*
    .*rock.*
point:
    pop
polar:
    metalcore
    ^hardcore$
regal:
    Garage Rock
    Funk
seba:
    ^ska$
sfr:
    liquid funk
shed:
    techno
    hip-hop
    dub
    rap
    Alternative Rock
smokey joe:
    ska
    punk
soft machine:
    Progressive Rock
solar:
    metal
    k-pop
st germain:
    .*metal.*
    .*rock.*
team scheisse:
    ^rock$
timecode:
    rhythm and blues
    ^.*metal.*$
the horsemen present:
    .*grind.*
    .*crust.*
    .*metal.*
    .*rock.*
vril:
    screamo
    emo
    black metal
    heavy metal
various::
    emo
    post-hardcore
various artists:
    emo
    post-hardcore
xdb:
    dub
    ska
*:
    ^electronic$
    ^rock$
    ^heavy metal$
    ^experimental$
    ^electronica$
    ^trance$

@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 22, 2026

To further give you the full picture of my reasoning: I took quite some ideas for the aliases and iignores features from discontinued whatlastgwnre 3rd party plugin. It had curated lists of very useful patterns for the aliases features.

They worked well for me back then when I used the plugin regularly, so I wanted to continue use them and also include them in my lastgenre implementation of these feature ideas.

I still plan to ship a curated list of aliases with beets! Stuff that works for everyone. Shipping in a sep file is easy.

I tried hard to make them work in yaml and failed. It was a never ending story escaping everything so yaml accepts it:

# tags with ampersands
d(rum)?[ n/]*b(ass)? = drum and bass
drill[ n/]*bass = drill & bass
hard[ n/]*heavy = hard & heavy
r(hythm)?[ n/]*b(lues)? = rhythm & blues
rock[ n/]*roll = rock & roll
stage[ n/]*screen = stage & screen
# consistent delimiters
(c|k|j)[ /]*(folk|goth|hip hop|pop|rock|ska|trance) = \g<1>-\g<2>
(euro(?!p[ae]+n?)|neo|post|tech(?!n[io]))[ /]*(\w+) = \g<1>-\g<2>
(g?lo)[ /]*fi = \g<1>-fi
(glitch|hip|jazz|trip)y?([ /]*hip)?[ /]*hop = \g<1>\shop
(p|g)[ /]*funk = \g<1>-funk
nu[/-]*(disco|jazz|metal|soul) = nu \g<1>
synth[ /-]+(\w+) = synth\g<1>
# abbreviation related
alt(/|er*n(/|ati[fv](es?)?))? = alternative
avant(/|gard)? = avantgarde
# ele[ck]tr(i[ck]|o(ni[ck][as]*)?) = electronic
goth(?!ic) = gothic
prog+(/|r*es+(/|i[fv]e?)?)? = progressive
ps?y(/|ch(/|[aeo]l?del+i[ca]+)?)? = psychedelic
trad(/|ition(/|al)?)?-? = traditional
# other
#(cabaret|comedi(an|e)|humou?r|kabarett|parod(ie|y)) = comedy
#^(film|games?|movies?|t(ele)?v(ision)?|video(s| )?games?) ?(scores?|music)? = soundtrack
#g?old(i(es?)?|en)? = oldies
(8[ /-]*bit|(8[ /-]*)?bit[ /-]*pop|chip([ /-]*tunes?)?) = chiptune
(fem|m)ale[ /]*vo(cal(ist)?|ice)s? = \g<1>ale vocalist
(ost|vgm|scores?|video[ /]*game[ /]*music) = soundtrack
.*top.*[0-9]+.* = charts
best.*of (\w+) = \g<1>
br(eaks?|oken)([ /]*beats?)? = breakbeat
down[ /-]*(beat|tempo?) = downtempo
minimal[ /-]*(electronic|house|techno) = minimal
shoegaze(r|ing?) = shoegaze

I'm giving you all this so you might better understand that I did my due diligence, made my homework and had a long and thoughtful path until I decided that a custom format is the best and most of all userfriendly way.

If you still insist that yaml is the best way I will start doing it. But please do note that I put a lot of thought, work and experimentation into my decisions and it's not something I just decided out of the blue.

@snejus
Copy link
Copy Markdown
Member

snejus commented Mar 22, 2026

multiline regexes dont solve the escaping problems, also I don't really need them for my usecase. What are you trying to say here?

I want to understand this better. Can you give me an example where this is an issue?

@JOJ0 JOJ0 added lastgenre plugin Pull requests that are plugins related labels Mar 27, 2026
@JOJ0 JOJ0 force-pushed the lastgenre_forbidden branch 2 times, most recently from 66a7d98 to 1785dab Compare March 28, 2026 23:12
@JOJ0
Copy link
Copy Markdown
Member Author

JOJ0 commented Mar 28, 2026

Hi again @snejus, I refactored this to have everything right within the beets config. I added some details to docs and docstrings user's should be cautious about when putting there regex into yaml.

I won't be able to recall excatly errors from when I first started with this feature. In the end it doesnt matter. I can only suppose that I might have tried to put regex into double quotes which is a very bad idea. Also I might have had a lot of errors with ini as well and all that added up to the conclusion that yaml and ini are bad -> Invent something that just works and most of all is more user friendly. I still find my first version the most user friendly and it didn't require any "caution don't do this, don't do that" stuff in the docs ;-) But I now learned more about it and it is not too much to worry about. In most cases with regex in yaml it is just best to not use any single or double quotes at all and most stuff will work.

So @snejus the 3 things I'd appreciate if you took special attention to in another review are:

  • How the patterns are loaded from the config. Is that the best way?

  • Is the additinoal docs in the Attention box absolutely correct?

  • Let's re-review in what places the ignorelist should be applied. I've ran over this dozens of times already and it is simply not just one place. In my opinion it should be 3 places:

    • directly after fetching new genres
    • directly after reading existing genres (but we don't actually do that, it happens in resolve_genre, similar to whitelist checks)
    • when handling ancestors during canonicalization

Currently existing genres check happens in resolve_genre, and it is applied possibly more than once. I'm just running through that again to see if there is a redundant check in there.

@JOJ0 JOJ0 force-pushed the lastgenre_forbidden branch from 1785dab to c5a14d3 Compare March 29, 2026 09:11
@snejus
Copy link
Copy Markdown
Member

snejus commented Mar 31, 2026

best to not use any single or double quotes at all and most stuff will work.

You can use them as long as you use yaml literal (or scalar) style blocks:

configuration: |
  you can use
  any
  characters
  '"
  you like

Will review this tomorrow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lastgenre plugin Pull requests that are plugins related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants