Feature selection library for Python
Supervised, unsupervised, wrapper, embedded and hybrid methods under a unified interface.
ITMO_FS is an open-source feature selection toolbox developed at ITMO University.
It implements dozens of classical and modern algorithms and exposes them via a scikit-learn-friendly API.
Typical use cases:
- Dimensionality reduction for high-dimensional datasets
- Preprocessing for traditional ML models (SVM, logistic regression, tree-based models, etc.)
- Exploratory analysis and feature ranking in research projects
- Benchmarking and comparison of feature selection algorithms
- Rich set of algorithms: supervised / unsupervised filters, wrappers, embedded, hybrid and ensemble methods in one library.
- Scikit-learn compatible API:
fit,transform,fit_transformand easy integration intosklearnpipelines. - Composable filters: separate “measure” and “cutting rule” components let you implement custom strategies (e.g. thresholds or top-k).
- Dense and sparse data support: works with NumPy arrays, pandas DataFrames and SciPy sparse matrices.
- Research background: algorithms are based on well-known methods from the literature and used in research projects.
From PyPI:
pip install -U ITMO_FSFrom the latest source:
git clone https://github.com/ctlab/ITMO_FS.git
cd ITMO_FS
pip install .from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from ITMO_FS.filters.univariate import UnivariateFilter, select_k_best
from ITMO_FS.filters.univariate import f_ratio_measure
# Synthetic classification dataset
X, y = make_classification(
n_samples=1000,
n_features=100,
n_informative=10,
n_redundant=30,
random_state=42,
)
# 1) Select 20 best features by F-ratio
fs = UnivariateFilter(
measure=f_ratio_measure,
cutting_rule=select_k_best(k=20),
)
# 2) Train a classifier on selected features
clf = LogisticRegression(max_iter=1000)
pipe = Pipeline([
("feature_selection", fs),
("classifier", clf),
])
pipe.fit(X, y)
print("Train accuracy:", pipe.score(X, y))For more examples (wrappers, hybrids, ensembles), see the documentation.
ITMO_FS groups algorithms into several families:
- Filters
- Supervised: correlation-based, information-theoretic, statistical, Relief-based, Laplacian and other measures.
- Unsupervised: Laplacian / spectral scores, multi-cluster and discriminative feature selection.
- Wrappers: forward / backward selection, QPFS, hill-climbing, simulated annealing, recursive elimination.
- Embedded methods: MOSS, MOSNS, RFE and related model-based approaches.
- Hybrid & ensembles: MeLiF and other combinations of filters and wrappers.
To keep this page readable, the detailed list of algorithms is hidden under a collapsible section:
Click to expand the full list of algorithms
- Spearman correlation
- Pearson correlation
- Fit Criterion
- F ratio
- Gini index
- Symmetric Uncertainty
- Fechner correlation
- Kendall correlation
- Information Gain
- ANOVA
- Chi-squared
- Relief
- ReliefF
- Laplacian score
- Modified T-score
- Mutual Information Maximization
- Minimum Redundancy Maximum Relevance
- Joint Mutual Information
- Conditional Infomax Feature Extraction
- Mutual Information Feature Selection
- Conditional Mutual Info Maximization
- Interaction Capping
- Dynamic Change of Selected Feature
- Composition of Feature Relevancy
- Max-Relevance and Max-Independence
- Interaction Weight
- Double Input Symmetric Relevance
- Fast Correlation
- Statistical Inference Relief
- Trace Ratio (Fisher)
- Nonnegative Discriminative Feature Selection
- Robust Feature Selection
- Spectral Feature Selection
- VDM
- QPFS
- MIMAGA
- Trace Ratio (Laplacian)
- Multi-Cluster Feature Selection
- Unsupervised Discriminative Feature Selection
- Add Del
- Backward selection
- Sequential Forward Selection
- QPFS
- Hill climbing
- Simulated Annealing
- Recursive Elimination
- Filter Wrapper
- IWSSr-SFLA
- MOSNS
- MOSS
- RFE
- MeLiF
- Best goes first
- Best sum
Full documentation (tutorials and API reference) is available at:
Contributions are welcome!
- Report bugs and suggest features in the issue tracker.
- Add new algorithms (filters, wrappers, embedded or hybrid methods).
- Improve documentation, examples and tests.
Please follow the existing code style and open a pull request when your change is ready.
ITMO_FS is distributed under the BSD 3-Clause License.
See the LICENSE file for details.
