-
Notifications
You must be signed in to change notification settings - Fork 46
Feature/hucira #901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/hucira #901
Conversation
Zethson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much already! Here's some first initial feedback:
- Please ensure that you have a descriptive pull request title and the pull request description only has the necessary detail. This can also be done later but it's important.
- Your implementation should not add any dependencies. Pertpy has to minimize dependencies because it's used by a loooot of people. Every dependency that we add has the potential to break users environment and increases complexity. Therefore, please remove bokeh, pycirclize, and tqdm. The plot should be implemented with seaborn and instead of using tqdm, you can use Rich directly. Not sure what to do about pycirclize - if necessary we can talk about it. That's one of the differences of implementing something for yourself vs implementing it for many.
- Please ensure that all functions that have a bit more than trivial complexity have a proper docstring. Please adhere to our docstring style.
- I didn't review everything yet because I think there's still a lot to do. Let's address these first.
| return adata | ||
|
|
||
|
|
||
| def human_cytokine_dict(exclude_well_biased_genes=True) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please type this.
| def human_cytokine_dict(exclude_well_biased_genes=True) -> pd.DataFrame: | |
| def human_cytokine_dict(exclude_well_biased_genes: bool = True) -> pd.DataFrame: |
|
|
||
|
|
||
| def human_cytokine_dict(exclude_well_biased_genes=True) -> pd.DataFrame: | ||
| r"""Human Cytokine Dictionary curated from PBMC allows you to infer differential cytokine activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| r"""Human Cytokine Dictionary curated from PBMC allows you to infer differential cytokine activity. | |
| """Human Cytokine Dictionary curated from PBMC allows you to infer differential cytokine activity. |
this doesn't need to be a raw string, right?
| def human_cytokine_dict(exclude_well_biased_genes=True) -> pd.DataFrame: | ||
| r"""Human Cytokine Dictionary curated from PBMC allows you to infer differential cytokine activity. | ||
| The Human Cytokine Dictionary was created from single-cell RNA-seq of 9,697,974 human peripheral blood mononuclear cells (PBMC) from 12 donors stimulated in vitro with 87 different cytokines. The object is a dataframe representing cytokine activity as differentially expressed genes after cytokine perturbation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always write sentences in their own line.
| Returns: | ||
| Pandas DataFrame | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| The Human Cytokine Dictionary was created from single-cell RNA-seq of 9,697,974 human peripheral blood mononuclear cells (PBMC) from 12 donors stimulated in vitro with 87 different cytokines. The object is a dataframe representing cytokine activity as differentially expressed genes after cytokine perturbation. | ||
| References: | ||
| Oesinghaus, Lukas and Becker, S{\"o}ren and Vornholz, Larsen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
| 2. Creates ranking of query data genes contrasting condition1 vs condition2. A continuum from genes most associated with condition1 (top) to genes most associated with condition2 (bottom) | ||
| 3. Computes enrichment of each cytokine by matching their associated gene set in the ranked list. | ||
| Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
| verbose: bool = False, | ||
| threads: int = 6, | ||
| ) -> pd.DataFrame: | ||
| """Function wrapper: Computes cytokine enrichment activity in one celltype using GSEA scoring. Loops through several threshold value to obtain more robust gene sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """Function wrapper: Computes cytokine enrichment activity in one celltype using GSEA scoring. Loops through several threshold value to obtain more robust gene sets. | |
| """Computes cytokine enrichment activity in one celltype using GSEA scoring. | |
| Loops through several threshold value to obtain more robust gene sets. |
It's a rule that the first line is always just one sentence. No exceptions.
| weight: float = 1.0, | ||
| seed: int = 2025, | ||
| verbose: bool = False, | ||
| threads: int = 6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a super random value
| ranked_stats, _num_cells = hucira._compute_ranking_statistic(dummy_adata, contrast_column, contrasts_combo) | ||
| assert isinstance(ranked_stats, pd.DataFrame) | ||
|
|
||
| # with pytest.raises(KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be commented out, right?
| ("B cell", "B_cell"), | ||
| ("CD8a", "CD8_T_cell"), | ||
| ("Mono", "CD14_Mono"), | ||
| ] # can't be a list for "run_one_enrichment_test()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's meant with that.
PR Checklist
docsis updatedDescription of changes
adding new tool.
Technical details
Technically, the most interesting part is the data itself. Most of the methods that come with this tool are wrappers that aid with the enrichment analysis (e.g., help create a ranked list from query data, or help navigate the different levels of the data).
The plotting tools are seaborn heatmap wrappers that are tailored to the output format.
Only the cytokine communication methods are specific to this tool. (get_senders and receivers()). This plotting function is specific to this tool.
Additional context
new feature "hucira". Lukas wants to check if methods are too specific.
Some of the code might actually be redundant.. I had implemented the wrappers to make simple visualizations a click away for users but I can rethink about making the tool more lightweight.
The only dependencies I added (bokeh and pyCirclize) are for visualization tools
The very generic tests that I wrote are not passing. I don't know why, it looks like they're timing out.