A fast and efficient statistical method for predicting transcription factor activities from transcriptomic profiles using prior knowledge of target genes.
Follow these instructions to set up the environment and run the application on your local machine.
Open your terminal and clone the project repository:
git clone https://github.com/PathwayAndDataAnalysis/z-aggregate
cd z-aggregateThis project uses uv for extremely fast package management and execution.
macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Note: After installation, you may need to restart your terminal.
You do not need to manually create virtual environments. Run the following command to sync the project and install all required packages (Scanpy, NumPy, Pandas, etc.) into a managed environment:
uv syncTo run the application, use uv run.
uv run z-aggregate -ds <path_to_data> -p <path_to_network> -o <output_folder>uv run z-aggregate \
--dataset ./data/sc_counts.h5ad \
--priors ./data/causal-priors.tsv \
--output ./results \
--weight-type Correlation_Weight \
--verbose| Flag | Long Flag | Type | Default | Description |
|---|---|---|---|---|
| -ds | --dataset |
Path | Required | Path to expression data. Supports .h5ad, .csv, .tsv, .txt. |
| -p | --priors |
Path | Required | Path to the prior network file (TF-Target interactions). |
| -o | --output |
Path | Required | Directory where results will be saved. |
| -v | --verbose |
Flag | False |
Enable detailed logging output. |
--min-targets |
Int | 5 |
Minimum number of target genes required per TF to be included. | |
--weight-type |
Enum | Uniform |
Weighting strategy. See Weight Types below. | |
--output-format |
Str | both |
Format of output. Options: tsv, h5ad, both. |
|
| Preprocessing Options | ||||
--preprocess |
Flag | True |
Enable standard QC and LogNormal preprocessing. | |
--no-preprocess |
Flag | - | Disable preprocessing (use if input is already normalized). | |
--min-genes |
Int | 1000 |
Minimum genes per cell (QC). | |
--min-cells |
Int | 10 |
Minimum cells per gene (QC). | |
--max-mt-pct |
Float | 20.0 |
Maximum mitochondrial percentage allowed (QC). |
You can adjust how the algorithm weights the edges between TFs and Target Genes using --weight-type:
Uniform_Weight: Technically no weights. This treats upregulates-expression as1, and downregulates-expression as-1.Correlation_Weight: Weights are scaled by the Spearman correlation between TF and Target expression.Specificity_Weight: Weights are scaled by1 / (Number of TFs regulating that gene).Non_Zero_Rate_Weight: Weights are scaled by the detection rate of the target gene.Existing_Weight: Uses the weight column provided in the input prior file.
- Formats:
.h5ad(Anndata),.csv(comma-separated),.tsv(tab-separated). - Structure: If text-based, rows should be Cells and columns Genes (the tool will transpose automatically), or standard Anndata structure.
A CSV or TSV file containing TF-Target interactions.
- Required Columns:
source(TF),interaction(mode),target(Gene). - Optional:
weight. - Example:
source,interaction,target TF_A,1,Gene_X TF_B,-1,Gene_Y
The tool generates the following files in the specified output directory:
z_aggregate_scores.tsv: Matrix of inferred TF activities (Cells x TFs).z_aggregate_pvalues.tsv: Significance values for the activities.z_aggregate_results.h5ad(Optional): A copy of the input Anndata object containing the results inobsm.