Context
In the multi-engine search path (e.g. Comet + MSGF+, optionally + Sage), quantms currently merges per-engine PSMs with CONSENSUSID after per-engine PercolatorAdapter rescoring, then applies FDR. This generates per-file CONSENSUSID fan-out and relies on IDRipper when the rescoring step was previously merged per-sample or per-project.
With ms2features_enable=true (quantms-rescoring via ms2rescore), every engine's PSMs receive the same run-invariant features (MS²PIP, DeepLC, precursor-ppm). Those features are numerically comparable across engines by construction, which raises the question: can we simplify the multi-engine path by dropping CONSENSUSID and merging engines directly into a single PercolatorAdapter call?
This issue is a discussion thread to decide which direction to take.
What we learned from the OpenMS source
Short version: the naive path IDMerger → PercolatorAdapter is not supported by OpenMS.
IDMerger is a dumb concatenator. It preserves per-PeptideHit meta-values verbatim as a sparse flat-map. A Comet PSM carries xcorr and no hyperscore; a Sage PSM carries hyperscore and no xcorr. There is no unified feature schema in the merged idXML — just ragged per-hit meta-values (IDMergerAlgorithm.cpp:170-237).
PercolatorAdapter explicitly refuses multi-engine input (PercolatorAdapter.cpp:493-498):
if (se1 != se2) { OPENMS_LOG_ERROR << "... Use TOPP_PSMFeatureExtractor to merge ...";
return INPUT_FILE_CORRUPT; }
- Even if you bypassed that check,
PercolatorInfile::store silently drops any PSM missing any feature column (PercolatorInfile.cpp:531-566). Ragged rows never reach the percolator binary.
The sanctioned multi-engine path in OpenMS is:
IDMerger -merge_proteins_add_PSMs
→ PSMFeatureExtractor -multiple_search_engines -impute
→ PercolatorAdapter
PSMFeatureExtractor -impute fills missing features with observed min/max (or float::max with -limit_imputation) — that's the "NaN handling" step.
Gap: PercolatorFeatureSetHelper::addMULTISEFeatures (line 502-529) branches only on Comet, MSGF+, X!Tandem, Mascot. Sage is not in this code path. PSMFeatureExtractor is also marked @experimental.
Candidate approaches for discussion
1. Adopt the OpenMS-sanctioned path. Use IDMerger -merge_proteins_add_PSMs → PSMFeatureExtractor -multiple_search_engines -impute → PercolatorAdapter. Works for Comet + MSGF+ (+ X!Tandem, Mascot). Does not work for Sage without an upstream patch to PercolatorFeatureSetHelper. Conservative, uses well-tested OpenMS primitives.
2. Bypass per-engine features, rely only on rescoring features. With ms2rescore enabled, every engine's PSMs carry identical run-invariant features (MS²PIP / DeepLC / precursor-ppm). Feed the merged idXML to PercolatorAdapter with -generic-feature-set (or a similar restriction) so only the shared columns drive the SVM, ignoring engine-native scores. Works for any engine including Sage. Loses engine-native features from the SVM, which may or may not hurt PSM recovery — needs a benchmark.
3. Keep CONSENSUSID. It was purpose-built for harmonising scores across engines. The per-file fan-out is real but the step is cheap. In light of the blockers above, this may remain the right tool and the scope of this issue shrinks to "stop discussing the replacement, invest elsewhere."
4. Contribute upstream. Open a PR against OpenMS to add Sage support to PercolatorFeatureSetHelper::addMULTISEFeatures and lift PSMFeatureExtractor's experimental label. This unblocks option 1 for every engine and benefits the broader OpenMS user base. Long timeline.
What would help
- Benchmark evidence on Comet + MSGF+ datasets comparing options 1, 2, and 3 at 1 % FDR (PSM / peptide counts, calibration). No published benchmark covers this at quantms scale.
- Scientific input on whether option 2 (rescoring-only feature set) is defensible — is it equivalent to option 1 empirically, or does dropping engine-native features cost us?
- Maintainer preference on upstreaming Sage support vs. working around it in quantms.
Prerequisite already in flight
#706 removes ms2features_range = by_sample | by_project and the IDRipper Nextflow module. These exist only to enable pooled Percolator runs that get split back per-run for CONSENSUSID, and removing them is a no-op for the default path while simplifying any future CONSENSUSID replacement work. That PR is independent of the discussion here — the multi-engine topology is untouched.
Not in scope for this issue
- Changes to the DIA workflow (no
CONSENSUSID usage).
- Pooling Percolator across files (separate discussion, requires its own benchmark).
- Native-Percolator bypass (running
percolator outside PercolatorAdapter on merged pin files).
Context
In the multi-engine search path (e.g. Comet + MSGF+, optionally + Sage), quantms currently merges per-engine PSMs with
CONSENSUSIDafter per-enginePercolatorAdapterrescoring, then applies FDR. This generates per-fileCONSENSUSIDfan-out and relies onIDRipperwhen the rescoring step was previously merged per-sample or per-project.With
ms2features_enable=true(quantms-rescoringviams2rescore), every engine's PSMs receive the same run-invariant features (MS²PIP, DeepLC, precursor-ppm). Those features are numerically comparable across engines by construction, which raises the question: can we simplify the multi-engine path by droppingCONSENSUSIDand merging engines directly into a singlePercolatorAdaptercall?This issue is a discussion thread to decide which direction to take.
What we learned from the OpenMS source
Short version: the naive path
IDMerger → PercolatorAdapteris not supported by OpenMS.IDMergeris a dumb concatenator. It preserves per-PeptideHitmeta-values verbatim as a sparse flat-map. A Comet PSM carriesxcorrand nohyperscore; a Sage PSM carrieshyperscoreand noxcorr. There is no unified feature schema in the merged idXML — just ragged per-hit meta-values (IDMergerAlgorithm.cpp:170-237).PercolatorAdapterexplicitly refuses multi-engine input (PercolatorAdapter.cpp:493-498):PercolatorInfile::storesilently drops any PSM missing any feature column (PercolatorInfile.cpp:531-566). Ragged rows never reach thepercolatorbinary.The sanctioned multi-engine path in OpenMS is:
PSMFeatureExtractor -imputefills missing features with observed min/max (orfloat::maxwith-limit_imputation) — that's the "NaN handling" step.Gap:
PercolatorFeatureSetHelper::addMULTISEFeatures(line 502-529) branches only on Comet, MSGF+, X!Tandem, Mascot. Sage is not in this code path.PSMFeatureExtractoris also marked@experimental.Candidate approaches for discussion
1. Adopt the OpenMS-sanctioned path. Use
IDMerger -merge_proteins_add_PSMs → PSMFeatureExtractor -multiple_search_engines -impute → PercolatorAdapter. Works for Comet + MSGF+ (+ X!Tandem, Mascot). Does not work for Sage without an upstream patch toPercolatorFeatureSetHelper. Conservative, uses well-tested OpenMS primitives.2. Bypass per-engine features, rely only on rescoring features. With
ms2rescoreenabled, every engine's PSMs carry identical run-invariant features (MS²PIP / DeepLC / precursor-ppm). Feed the merged idXML toPercolatorAdapterwith-generic-feature-set(or a similar restriction) so only the shared columns drive the SVM, ignoring engine-native scores. Works for any engine including Sage. Loses engine-native features from the SVM, which may or may not hurt PSM recovery — needs a benchmark.3. Keep
CONSENSUSID. It was purpose-built for harmonising scores across engines. The per-file fan-out is real but the step is cheap. In light of the blockers above, this may remain the right tool and the scope of this issue shrinks to "stop discussing the replacement, invest elsewhere."4. Contribute upstream. Open a PR against OpenMS to add Sage support to
PercolatorFeatureSetHelper::addMULTISEFeaturesand liftPSMFeatureExtractor's experimental label. This unblocks option 1 for every engine and benefits the broader OpenMS user base. Long timeline.What would help
Prerequisite already in flight
#706 removes
ms2features_range = by_sample | by_projectand theIDRipperNextflow module. These exist only to enable pooled Percolator runs that get split back per-run forCONSENSUSID, and removing them is a no-op for the default path while simplifying any futureCONSENSUSIDreplacement work. That PR is independent of the discussion here — the multi-engine topology is untouched.Not in scope for this issue
CONSENSUSIDusage).percolatoroutsidePercolatorAdapteron mergedpinfiles).