Skip to content

feat(duplicates): add chroma plugin integration#6624

Open
ShimmerGlass wants to merge 1 commit into
beetbox:masterfrom
ShimmerGlass:dup-chroma
Open

feat(duplicates): add chroma plugin integration#6624
ShimmerGlass wants to merge 1 commit into
beetbox:masterfrom
ShimmerGlass:dup-chroma

Conversation

@ShimmerGlass
Copy link
Copy Markdown
Contributor

Description

When the chroma plugin is enabled, the duplicates plugin can now make use of
its sonic fingerprinting capabilities to compare the tracks audio in addition of
their keys. This is especially useful when multiple versions of a song, such
as live or remixes exist in the library.

When the option is enabled, tracks with the same keys but different audio
will be excluded from the results.

The following additional options are available when the chroma plugin is
enabled:

  • chroma: Enable fingerprint comparison during duplicate search
  • chroma_threshold: Threshold, from 0 to 1, to consider track audio the
    same. 1 means an exact match; 0 nothing alike. Default: 0.9.

To Do

  • Documentation. (If you've added a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
  • Changelog. (Add an entry to docs/changelog.rst to the bottom of one of the lists near the top of the document.)
  • Tests. (Very much encouraged but not strictly required.)

@ShimmerGlass ShimmerGlass requested a review from a team as a code owner May 10, 2026 16:09
@ShimmerGlass ShimmerGlass changed the title Dup chroma feat(duplicates): add chroma plugin integration May 10, 2026
@github-actions github-actions Bot added the chroma chroma plugin label May 10, 2026
@ShimmerGlass ShimmerGlass added the duplicates duplicates plugin label May 10, 2026
@ShimmerGlass ShimmerGlass force-pushed the dup-chroma branch 2 times, most recently from 8646dc0 to 5992bd1 Compare May 10, 2026 16:15
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

❌ Patch coverage is 71.96262% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.38%. Comparing base (0309096) to head (11d12ca).

Files with missing lines Patch % Lines
beetsplug/duplicates.py 75.00% 11 Missing and 11 partials ⚠️
beetsplug/chroma.py 53.84% 3 Missing and 3 partials ⚠️
beets/plugins.py 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6624   +/-   ##
=======================================
  Coverage   72.38%   72.38%           
=======================================
  Files         159      159           
  Lines       20645    20689   +44     
  Branches     3269     3280   +11     
=======================================
+ Hits        14944    14976   +32     
- Misses       4976     4982    +6     
- Partials      725      731    +6     
Files with missing lines Coverage Δ
beets/plugins.py 87.09% <66.66%> (-0.51%) ⬇️
beetsplug/chroma.py 46.95% <53.84%> (+0.33%) ⬆️
beetsplug/duplicates.py 53.15% <75.00%> (+4.67%) ⬆️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

with chroma enabled, add the option to compare tracks fingerprints when
duplicates are found to remove tracks with different audio from the
results.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

grug see PR try let duplicates use chroma sonic fingerprint to narrow duplicate results (so grug can avoid “same mbid but different recording” trap).

Changes:

  • add --chroma + --chroma-threshold options to duplicates command (only when chroma plugin loaded)
  • add AcoustidPlugin.compare_items() helper in chroma plugin
  • add small plugin helper find_plugin() and update docs/tests

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
beetsplug/duplicates.py add chroma option wiring + filtering step in duplicate grouping
beetsplug/chroma.py add compare_items() API to compare two library Items via fingerprints
beets/plugins.py add find_plugin(type) helper to fetch loaded plugin instance
docs/plugins/duplicates.rst document chroma integration options and behavior
docs/changelog.rst add changelog entry for new feature
test/plugins/test_duplicates.py add tests for chroma integration behavior
Comments suppressed due to low confidence (3)

beetsplug/duplicates.py:88

  • grug think chroma = self.config['chroma'] can be true via config even when chroma plugin not loaded (then CLI flag not exist). later _chroma_filter_dups call _chroma_plug() and get None, then None.compare_items boom. fix: when chroma option true, check _chroma_plug() not None early and raise UserError with clear message (enable chroma plugin / install pyacoustid).
    def commands(self):
        def _dup(lib, opts, args):
            self.config.set_args(opts)
            album = self.config["album"].get(bool)
            checksum = self.config["checksum"].get(str)
            chroma = self.config["chroma"].get(bool)
            chroma_threshold = self.config["chroma_threshold"].get(float)
            copy = bytestring_path(self.config["copy"].as_str())
            count = self.config["count"].get(bool)
            delete = self.config["delete"].get(bool)
            remove = self.config["remove"].get(bool)
            fmt_tmpl = self.config["format"].get(str)
            full = self.config["full"].get(bool)
            keys = self.config["keys"].as_str_seq()
            merge = self.config["merge"].get(bool)
            move = bytestring_path(self.config["move"].as_str())
            path = self.config["path"].get(bool)
            tiebreak = self.config["tiebreak"].get(dict)
            strict = self.config["strict"].get(bool)
            tag = self.config["tag"].get(str)

            if album and chroma:
                raise ui.UserError("cannot use chroma for albums")

beetsplug/duplicates.py:462

  • grug look at _chroma_filter_dups: it keeps item b when score is None or < chroma_thresh, and drops when score is high. docs/changelog say chroma should exclude tracks with different audio, so grug expect low score should be excluded, high score kept. also pairwise(items) compare adjacent original list, so when middle item dropped, next compare still against dropped one (A-B-B-C problem) and results wrong for 3+ items. fix: decide clear rule (keep only items similar to chosen reference / build clusters) and make tests+docs match.
    def _duplicates(
        self, objs, keys, full, strict, tiebreak, merge, chroma, chroma_thresh
    ):
        """Generate triples of keys, duplicate counts, and constituent objects."""
        offset = 0 if full else 1
        for k, objs in self._group_by(objs, keys, strict).items():
            if len(objs) <= 1:
                continue

            objs = self._order(objs, tiebreak)

            if chroma:
                objs = self._chroma_filter_dups(objs, chroma_thresh)
                if len(objs) <= 1:
                    continue

            if merge:
                objs = self._merge(objs)
            yield (k, len(objs) - offset, objs[offset:])

    def _chroma_filter_dups(self, items, chroma_thresh):
        choma_plug = self._chroma_plug()
        res = [items[0]]

        for a, b in itertools.pairwise(items):
            score = choma_plug.compare_items(a, b)
            if score is None or score < chroma_thresh:
                res.append(b)

        return res

test/plugins/test_duplicates.py:107

  • grug see chroma tests expect high similarity (0.99) => no duplicates, and low similarity (0.5) => duplicates shown. this match current _chroma_filter_dups logic, but it contradict doc/changelog text that different-audio tracks should be excluded when chroma enabled. once filter fixed, these assertions likely need flip (similar => duplicates output, dissimilar => empty) or update docs if intent opposite.
    @patch("acoustid.compare_fingerprints", return_value=0.99)
    def test_duplicate_chroma_similar(self, cf):
        self.create_dups(2)
        out = self.run_cmd(chroma=True)

        assert out == ""

    @patch("acoustid.compare_fingerprints", return_value=0.5)
    def test_duplicate_chroma_dissimilar(self, cf):
        self.create_dups(2)
        out = self.run_cmd(chroma=True)

        assert self.dup_item.artist in out
        assert self.dup_item.album in out
        assert self.dup_item.title in out

Comment thread beetsplug/duplicates.py
Comment on lines 17 to 33
@@ -27,6 +29,7 @@
displayable_path,
subprocess,
)
from beetsplug import chroma

Comment on lines +95 to +97
sonic fingerprinting capabilities to compare the tracks audio in addition of
their ``keys``. This is especially useful when multiple versions of a song, such
as live or remixes exist in the library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chroma chroma plugin duplicates duplicates plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants