JUMP-single-cell

Data

In this repository, we apply the phenotypic profiling model, which predicts the phenotypic class of single cells using nuclei features, to the JUMP-Target pilot data from the JUMP consortium.

In this dataset, there are 51 plates with one of three perturbation types (Clustered Regularly Interspaced Short Palindromic Repeats [CRISPR], Open Reading Frame [ORF], and Compound) for two cell lines (A549 and U2OS).

Each perturbation type has its own platemap and metadata file in the reference_plate_data folder. A barcode platemap is included to associate each plate with the correct platemap file.

We segment a total of 20,959,860 single cells in all plates.

To reproduce this project, please ensure adequate storage as the CellProfiler SQLite database files are approximately 1.1 TB.

Goal

Traditional image-based profiling pipelines aggregate single-cells into well-level profiles. While, this process removes outliers that might dampen signal, it also removes potentially interesting biologically-meaningful heterogeneity.

By predicting single-cell phenotypes with our phenotypic profiling model, we hope to uncover important patterns of biology that would be missed with the traditional methodology. Specifically, the benefits of single-cell phenotyping include:

Granular phenotypic mechanisms of perturbations regarding (A) the impact perturbations have on a specific phenotype (e.g., disrupting mitosis) and (B) impact on phenotype prevalence (e.g., a gene knockout that causes apoptosis or stalls cells in a specific cell cycle phase).
Filter and/or combine cells of the same phenotypic class to purify and/or improve the traditional image-based profiling pipeline.
Adding knowledge to specific combinations of morphology features allows for self-referential interpretation, without the need for database signature lookup or other guilt-by-association methods.
When combined with different experimental designs (e.g., targeted fluorescence marker), we can test specific hypotheses regarding single-cell phenotype distributions (and other important hypotheses that would otherwise be impossible without single-cell phenotypes).

Repository Structure

Module	Purpose	Description
0.download_data	Download JUMP-Target SQLite files and process them with CytoTable	Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table.
1.process_data	Process SQLite files	Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data.
2.evaluate_data	Apply phenotypic profiling model	Runs class-balanced logistic regression prediction workflows to generate single-cell phenotype probabilities.
3.analyze_data	Analyze phenotypic predictions	Performs analyses to validate predicted phenotypic classes for perturbations compared to controls.
reference_plate_data	Platemaps and metadata	Holds platemap files, metadata by perturbation type, and barcode platemap mappings.

Development

We use a justfile to specify just commands for use with this project. Please see just installation details for configuration on your system.

Environment

For all modules, we use conda environments that include the required packages.

To create the environments from terminal, run the commands below:

# Make sure you are in the repository root
conda env create -n jump_sc -f environment.yml
conda env create -n R_jump_sc -f R_environment.yml

Alternatively, use the following just command.

# setup or update conda envs
just setup-conda-envs

Running Code from this Project

just commands are provided to run project tasks from the repository root. These commands are an entrypoint separate from directly invoking module shell scripts.

Individual steps:

# run step 0.download_data
just run-step-0

Module-specific scripts are also available in each workflow directory when you need direct execution.

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
0.5.quality_control		0.5.quality_control
0.download_data		0.download_data
1.process_data		1.process_data
2.evaluate_data		2.evaluate_data
3.analyze_data		3.analyze_data
reference_plate_data		reference_plate_data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
R_environment.yml		R_environment.yml
environment.yml		environment.yml
justfile		justfile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JUMP-single-cell

Data

Goal

Repository Structure

Development

Environment

Running Code from this Project

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JUMP-single-cell

Data

Goal

Repository Structure

Development

Environment

Running Code from this Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages