Implement dual ridge regression for Allen2022#2383
Merged
Conversation
…n n_samples < n_features. Scores identical. Memory magnitude lower. Runs faster.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Minimal version of #2364
Near identical scores for Allen2022 from V1 --> V4 (tolerance of 0.005). Had to bump tolerance for IT (tolerance of 0.01).
Ridge and RidgeCV regression now use the dual kernel form when n_samples < n_features. This is common in Brain-Score which there are only hundreds of stimuli but thousands to hundreds of thousands of model features. This avoids materializing the coefficient matrix (n_features, n_targets) which can produce OOM for large models on fMRI benchmarks.
This PR adds DualRidgeRegression and DualRidgeCVRegression in the metrics/regression_correlation/metric.py.
Instead of solving for the full weight matrix, we precompute a small project matrix and apply it to the neural target in chunks.
This PR effectively reduces a coefficient matrix for ViT-L at 148k features on Allen2022 at 52k neuroids stored at 8 bytes from 62GB down to ~1MB.