-
Notifications
You must be signed in to change notification settings - Fork 0
Performance Tips
automation edited this page Aug 8, 2025
·
2 revisions
Large Datasets
For datasets with >100k rows:
# Use batch processing
cleaner.clean_columns(columns, show_progress=True)
# Cache statistics for repeated operations
cleaner.add_zscore_columns(columns, cache_stats=True)Memory Optimization
# Process columns individually for memory efficiency
for col in large_columns:
cleaner.remove_outliers_zscore(col)
# Use in-place operations when possible
cleaner = StatClean(df, preserve_index=False)Multivariate Performance
# For many variables, consider dimensionality reduction first
from sklearn.decomposition import PCA
pca_data = PCA(n_components=5).fit_transform(df)