-
Notifications
You must be signed in to change notification settings - Fork 2
Network loader #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network loader #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates TCGA BRCA dataset updates with enhanced feature selection methods, adds a NetworkLoader utility, and updates documentation and versioning to reflect the changes.
- Added RandomForest-based feature selection support and updated module imports.
- Implemented phenotype preprocessing in SmCCNet and introduced a new NetworkLoader class for network file management.
- Updated README, notebook examples, and CHANGELOG for consistency.
Reviewed Changes
Copilot reviewed 33 out of 41 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| bioneuralnet/utils/init.py | Updated preprocess import to include RandomForest selection |
| bioneuralnet/external_tools/smccnet.py | Added phenotype_df validation and logger initialization |
| bioneuralnet/datasets/tcga_brca/README.md | Expanded and clarified TCGA BRCA data preprocessing and feature selection details |
| bioneuralnet/datasets/network_loader.py | Introduced new utility class for loading bundled network files |
| bioneuralnet/datasets/dataset_loader.py | Modified data loading to support different feature selection methods and file naming conventions |
| bioneuralnet/datasets/init.py | Updated all to export NetworkLoader |
| bioneuralnet/init.py | Updated version and module exports |
| README.md | Minor version update display |
| Cancer_example.ipynb | Expanded example with full pipeline demo for TCGA BRCA |
| CHANGELOG.md | Revised changelog to reflect version update and release notes |
Files not reviewed (8)
- MANIFEST.in: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_ae/size_13_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_4.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_21_net_3.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_3.csv: Language not supported
- bioneuralnet/datasets/tcga_brca/brca_pam50.csv: Language not supported
Comments suppressed due to low confidence (1)
bioneuralnet/datasets/dataset_loader.py:52
- [nitpick] The file naming convention in dataset_loader (e.g., 'brca_mirna.csv') differs from the uppercase style referenced in the README. Consider standardizing the naming convention for clarity.
self.data["brca_mirna"] = pd.read_csv(folder / "brca_mirna.csv", index_col=0)
|
|
||
| - **BUG**: A bug related to rdata files missing | ||
| - **New realease**: A new release will include documentation for the other updates. (1.0.3 or 1.0.2) | ||
| - **New realease**: A new release will include documentation for the other updates. (1.1.0) |
Copilot
AI
Apr 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo 'realease' should be corrected to 'release'.
| - **New realease**: A new release will include documentation for the other updates. (1.1.0) | |
| - **New release**: A new release will include documentation for the other updates. (1.1.0) |
Added dataset from tcga-brca:
Initial data dimensions:
Feature Selection Methods
Performed separately on Methylation and RNA datasets (top 1,000 features each):
Unsupervised:
Supervised:
also set up network loader to load the networks generated by SmCCNet from these feature selection.
More details for the data preprocessing in the README inside tcga_brca