Skip to content

Conversation

@ramosv
Copy link
Member

@ramosv ramosv commented Apr 24, 2025

Added dataset from tcga-brca:

Initial data dimensions:

miRNA: (1078, 503)
RNA: (775, 20532)
Methylation:(783, 20106)
Clinical: (1097, 18)
PAM50: (1087,)

Feature Selection Methods
Performed separately on Methylation and RNA datasets (top 1,000 features each):

  • Unsupervised:

    • Variance Filter (highest variance)
    • Autoencoder Weights (largest weights)
  • Supervised:

    • ANOVA F-test
    • RandomForest Feature Importance

also set up network loader to load the networks generated by SmCCNet from these feature selection.
More details for the data preprocessing in the README inside tcga_brca

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates TCGA BRCA dataset updates with enhanced feature selection methods, adds a NetworkLoader utility, and updates documentation and versioning to reflect the changes.

  • Added RandomForest-based feature selection support and updated module imports.
  • Implemented phenotype preprocessing in SmCCNet and introduced a new NetworkLoader class for network file management.
  • Updated README, notebook examples, and CHANGELOG for consistency.

Reviewed Changes

Copilot reviewed 33 out of 41 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
bioneuralnet/utils/init.py Updated preprocess import to include RandomForest selection
bioneuralnet/external_tools/smccnet.py Added phenotype_df validation and logger initialization
bioneuralnet/datasets/tcga_brca/README.md Expanded and clarified TCGA BRCA data preprocessing and feature selection details
bioneuralnet/datasets/network_loader.py Introduced new utility class for loading bundled network files
bioneuralnet/datasets/dataset_loader.py Modified data loading to support different feature selection methods and file naming conventions
bioneuralnet/datasets/init.py Updated all to export NetworkLoader
bioneuralnet/init.py Updated version and module exports
README.md Minor version update display
Cancer_example.ipynb Expanded example with full pipeline demo for TCGA BRCA
CHANGELOG.md Revised changelog to reflect version update and release notes
Files not reviewed (8)
  • MANIFEST.in: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_ae/size_13_net_2.csv: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_2.csv: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_4.csv: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_rf/size_21_net_3.csv: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_2.csv: Language not supported
  • bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_3.csv: Language not supported
  • bioneuralnet/datasets/tcga_brca/brca_pam50.csv: Language not supported
Comments suppressed due to low confidence (1)

bioneuralnet/datasets/dataset_loader.py:52

  • [nitpick] The file naming convention in dataset_loader (e.g., 'brca_mirna.csv') differs from the uppercase style referenced in the README. Consider standardizing the naming convention for clarity.
self.data["brca_mirna"]   = pd.read_csv(folder / "brca_mirna.csv",   index_col=0)


- **BUG**: A bug related to rdata files missing
- **New realease**: A new release will include documentation for the other updates. (1.0.3 or 1.0.2)
- **New realease**: A new release will include documentation for the other updates. (1.1.0)
Copy link

Copilot AI Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'realease' should be corrected to 'release'.

Suggested change
- **New realease**: A new release will include documentation for the other updates. (1.1.0)
- **New release**: A new release will include documentation for the other updates. (1.1.0)

Copilot uses AI. Check for mistakes.
@SundousHussein SundousHussein merged commit bac8746 into main Apr 24, 2025
2 of 8 checks passed
@ramosv ramosv deleted the Network-loader branch June 3, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants