-
Notifications
You must be signed in to change notification settings - Fork 4
6. Whole Transcriptome In Situ
The Atera In Situ is a recently released spatial transcriptomics platform from 10x Genomics. Atera captures whole-transcriptome expression (>18,000 genes) directly within tissue sections at single-cell resolution.
Whole Transcriptome Coverage
- Profiles >18,000 genes covering almost the entire protein-coding transcriptome
- Available as Atera WTA (whole transcriptome) or Atera Select (targeted panels)
- Customizable with up to 1,000 additional custom targets via stackable panels
High-Throughput Tissue Analysis
- Imageable area of >500 mm² per slide, supporting up to 4 slides per run
- Capable of analyzing up to 1,600 mm² of tissue per week
- Whole-transcriptome spatially resolved profiling at single-cell resolution
- Large-scale discovery studies in cancer, neuroscience, and developmental biology
- Unbiased cell type mapping and tumor microenvironment characterization
- High-plex targeted spatial analysis with Atera Select panels
📚 Resources
- Platform: [10x Genomics Atera]
- Dataset: [Atera In Situ Gene Expression, FFPE Human Breast Cancer]
This workshop covers the analysis of 10x Genomics Xenium In Situ whole transcriptome (WTA) data using Seurat v5 in R. We will analyze a human breast cancer FFPE tissue section profiled with the Xenium WTA panel (~18,000 gene targets, ~170,000 cells).
- Load and explore Xenium spatial transcriptomics data
- Perform quality control and filtering
- Normalize, cluster, and visualize cells
- Visualize gene expression in spatial context
- Annotate cell types using marker genes
- Perform spatial niche analysis
- Explore cell-cell spatial relationships
💡 More information can be found in
analysis_summary.html.
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(purrr)
library(FNN)
library(future)
packageVersion("Seurat")# v5.4.0
# Use sequential plan to avoid globals size issues with large Xenium objects
plan("sequential")
options(future.globals.maxSize = 50 * 1024^3) # 50 GB
# Path to raw Xenium data
# (Change this to your local Xenium data directory)
RawData_path <- "/Xenium_FullTranscript/RawData/Xenium_FT"
# Path to processed output files
processed_path <- "/Xenium_FullTranscript/ProcessedData"
The LoadXenium() function reads in the Xenium output directory and creates a Seurat object with spatial coordinates and cell segmentation boundaries.
Option 1: Load from raw Xenium output
# Path to Xenium output (Change this to your local Xenium data directory)
xenium.obj <- LoadXenium(
RawData_path,
fov = "fov",
segmentations = "cell" # ← KEY: loads polygon boundaries
)
# Save
saveRDS(
xenium.obj,
file = file.path(processed_path,"xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)
Option 2: Load pre-computed object
xenium.obj <- readRDS(
file.path(processed_path, "xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)
# Quick look at the object
xenium.obj
An object of class Seurat
27104 features across 170057 samples within 5 assays
Active assay: Xenium (18028 features, 0 variable features)
1 layer present: counts
4 other assays present: BlankCodeword, ControlCodeword, ControlProbe, GenomicControl
1 spatial field of view present: fov
head(xenium.obj,5)
| Cell ID | orig.ident | nCount_Xenium | nFeature_Xenium | segmentation_method | nCount_BlankCodeword | nFeature_BlankCodeword | nCount_ControlCodeword | nFeature_ControlCodeword | nCount_ControlProbe | nFeature_ControlProbe | nCount_GenomicControl | nFeature_GenomicControl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| aaaajgij-1 | SeuratProject | 5508 | 3428 | Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) | 0 | 0 | 1 | 1 | 0 | 0 | 4 | 3 |
| aaaandia-1 | SeuratProject | 6522 | 3753 | Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 5 |
| aaabalki-1 | SeuratProject | 1622 | 1385 | Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) | 1 | 1 | 0 | 0 | 0 | 0 | 5 | 5 |
| aaabchln-1 | SeuratProject | 812 | 745 | Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| aaaccnnp-1 | SeuratProject | 2974 | 2061 | Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 |
coords <- xenium.obj@images$fov@boundaries$centroids@coords
coords_df <- as.data.frame(coords)
coords_df$cell_id <- Cells(xenium.obj@images$fov@boundaries$centroids)
head(coords_df)
| cell_id | x | y |
|---|---|---|
| aaaajgij-1 | 74.00697 | 2732.648 |
| aaaandia-1 | 152.94557 | 2721.638 |
| aaabalki-1 | 15.98353 | 2817.409 |
| aaabchln-1 | 15.05376 | 2799.583 |
| aaaccnnp-1 | 59.01640 | 2763.473 |
| aaacdfgp-1 | 62.20883 | 2758.241 |
Xenium data QC focuses on:
-
nCount_Xenium— Total transcripts per cell -
nFeature_Xenium— Number of unique genes detected per cell
# Violin plots of QC metrics
pdf(file.path(processed_path, "01_qc_violin.pdf"), width = 6, height = 4)
p_qc <- VlnPlot(
xenium.obj,
features = c("nFeature_Xenium", "nCount_Xenium"),
ncol = 2,
pt.size = 0
) &
theme(axis.text.x = element_blank())
print(p_qc)
dev.off()
qc_stats <- data.frame(
Metric = c("Median transcripts/cell", "Median genes/cell",
"Mean transcripts/cell", "Mean genes/cell",
"Min transcripts/cell", "Max transcripts/cell"),
Value = c(
median(xenium.obj$nCount_Xenium),
median(xenium.obj$nFeature_Xenium),
round(mean(xenium.obj$nCount_Xenium), 1),
round(mean(xenium.obj$nFeature_Xenium), 1),
min(xenium.obj$nCount_Xenium),
max(xenium.obj$nCount_Xenium)
)
)
knitr::kable(qc_stats, caption = "QC Summary Statistics")
| Metric | Value |
|---|---|
| Median transcripts / cell | 2,116 |
| Median genes / cell | 1,543 |
| Mean transcripts / cell | 2,948.3 |
| Mean genes / cell | 1,805.2 |
| Min transcripts / cell | 0 |
| Max transcripts / cell | 26,078 |
Interpretation:
This table summarizes the basic QC metrics for the Xenium dataset. The median number of transcripts per cell is 2,116, with 1,543 detected genes per cell. The mean values (2,948 transcripts and 1,805 genes per cell) are higher than the medians, indicating a right-skewed distribution where a subset of cells has elevated transcript counts.
The minimum transcript count is 0, suggesting that a small number of segmented cells have no detected transcripts. In contrast, the maximum of 26,078 transcripts highlights substantial variability across cells, with some exhibiting much higher transcriptional activity.
pdf(file.path(processed_path, "02_right_skewed_distribution.pdf"), width = 7, height = 5)
median_count <- median(xenium.obj$nCount_Xenium)
mean_count <- mean(xenium.obj$nCount_Xenium)
max_y <- max(
hist(xenium.obj$nCount_Xenium, breaks = 80, plot = FALSE)$counts
)
p_right_skew <- ggplot(xenium.obj@meta.data, aes(x = nCount_Xenium)) +
geom_histogram(bins = 80, fill = "gray85", color = "gray35", linewidth = 0.2) +
geom_vline(xintercept = median_count, color = "#2C7BB6", linewidth = 1.1) +
geom_vline(xintercept = mean_count, color = "#D7191C", linewidth = 1.1) +
annotate("label", x = median_count - 500, y = max_y * 0.95,
label = paste0("Median = ", round(median_count)),
color = "#2C7BB6", fill = "white") +
annotate("label", x = mean_count + 1200, y = max_y * 0.82,
label = paste0("Mean = ", round(mean_count, 1)),
color = "#D7191C", fill = "white") +
labs(
title = "Distribution of Xenium Transcript Counts per Cell",
subtitle = "Mean is greater than median, indicating a right-skewed distribution",
x = "Transcripts per cell (nCount_Xenium)",
y = "Number of cells"
) +
theme_classic(base_size = 13)
print(p_right_skew)
dev.off()
📌 Note: This plot shows the distribution of transcript counts per cell. The mean is higher than the median, and the long tail extends toward higher transcript counts, indicating a right-skewed distribution. This suggests that most cells have moderate transcript counts, while a smaller subset has much higher transcript abundance.
pdf(file.path(processed_path, "03_spatial_qc.pdf"), width = 10, height = 5)
p_spatial <- SpatialFeaturePlot(
xenium.obj,
features = c("nCount_Xenium", "nFeature_Xenium"),
ncol = 2
)
print(p_spatial)
dev.off()
To make visualization and QC faster, we downsample cells from the top-left region of the tissue.
# Define region to subset (1500 x 1500)
x_min <- 4750
x_max <- 6250
y_min <- 1250
y_max <- 2750
coords_df$in_region <- coords_df$x >= x_min &
coords_df$x <= x_max &
coords_df$y >= y_min &
coords_df$y <= y_max
# Plot all cells, highlighting selected region in red
pdf(file.path(processed_path, "04_region_highlight.pdf"), width = 6, height = 6)
p_coords_region <- ggplot(coords_df, aes(x = x, y = y)) +
geom_point(color = "gray70", size = 0.1, alpha = 0.3) +
geom_point(
data = subset(coords_df, in_region),
color = "red",
size = 0.15,
alpha = 0.7
) +
coord_fixed() +
scale_x_continuous(breaks = seq(0, max(coords_df$x), by = 1000)) +
scale_y_continuous(breaks = seq(0, max(coords_df$y), by = 1000)) +
labs(
title = "Selected Spatial Region",
subtitle = "Highlighted region: x = 4750–6250, y = 1250–2750",
x = "X coordinate",
y = "Y coordinate"
) +
theme_classic(base_size = 13) +
theme(
panel.grid.major = element_line(color = "gray85", linewidth = 0.3),
panel.grid.minor = element_blank()
)
print(p_coords_region)
dev.off()
# Subset cells in this region
region_cells <- coords_df$cell_id[coords_df$in_region]
xenium.region <- subset(
xenium.obj,
cells = region_cells
)
cat("Number of cells in selected region:", ncol(xenium.region), "\n")
Number of cells in selected region: 11458
# Save the subset region Seurat object
# (Since this step can be time-consuming, we'll skip it during the workshop.)
saveRDS(
xenium.region,
file = file.path(processed_path, "xenium_region_x4750_6250_y1250_2750.rds")
)
pdf(file.path(processed_path, "05_subset_object_spatial_qc.pdf"), width = 10, height = 5)
p_spatial_region <- ImageFeaturePlot(
xenium.region,
features = c("nCount_Xenium", "nFeature_Xenium")
)
print(p_spatial_region)
dev.off()
Filter cells based on QC metrics to remove low-quality cells, debris, and empty segmentations.
| Threshold | Meaning |
|---|---|
nCount_Xenium > 10 |
Minimum total transcript counts per cell |
nFeature_Xenium > 5 |
Minimum number of detected genes per cell |
# Filter cells based on QC metrics
# nCount_Xenium > 10 : minimum total transcript counts per cell
# nFeature_Xenium > 5 : minimum number of detected genes per cell
# Removes low-quality cells, debris, and empty segmentations
cat("Cells before filtering:", ncol(xenium.region), "\n")
xenium.region <- subset(xenium.region,
subset = nCount_Xenium > 10 &
nFeature_Xenium > 5)
cat("Cells after filtering:", ncol(xenium.region ), "\n")
Cells before filtering: 11458
Cells after filtering: 11458
Xenium Atera is a whole-transcriptome assay (18,028 features). Here we use SCTransform to select the top variable features (default is 3,000) for downstream PCA.
# SCTransform normalization
# Xenium Atera is a whole-transcriptome assay (18,028 features), here we use
# SCTransform to select the top variable features (default is 3,000) for downstream PCA.
xenium.region <- SCTransform(xenium.region, assay = "Xenium")
# PCA
xenium.region <- RunPCA(xenium.region, npcs = 30)
# Elbow plot to determine optimal number of PCs
pdf(file.path(processed_path, "06_ElbowPlot_xenium_region.pdf"),
width = 5, height = 5)
ElbowPlot(xenium.region, ndims = 30)
dev.off()
# UMAP + Clustering
xenium.region <- RunUMAP(xenium.region, dims = 1:30)
xenium.region <- FindNeighbors(xenium.region, reduction = "pca", dims = 1:30)
xenium.region <- FindClusters(xenium.region, resolution = 0.2)
# Check clusters
cat("Number of clusters:", length(levels(Idents(xenium.region))), "\n")
table(Idents(xenium.region))
Result: Number of clusters: 10
| Cluster | Cell Count |
|---|---|
| 0 | 4,728 |
| 1 | 1,214 |
| 2 | 1,079 |
| 3 | 934 |
| 4 | 802 |
| 5 | 791 |
| 6 | 726 |
| 7 | 583 |
| 8 | 559 |
| 9 | 42 |
# Define shared cluster colors
n_clusters =length(levels(Idents(xenium.region)))
cluster_cols <- setNames(
c("#33FF00","#FFFF33","#F100F1","#085AFF","#E41A1C",
"#FF7F00", "#00FFFF","#990099","#EEAAEE","#F6E0D2",
"#614128", "#E7C76F","#9467BD","#AECF00","#5DCEAF")[1:n_clusters],
levels(Idents(xenium.region))
)
# UMAP colored by cluster
p1 <- DimPlot(xenium.region, reduction = "umap", label = TRUE, repel = TRUE,
cols = cluster_cols) +
ggtitle("UMAP - Clusters") + NoLegend()
# Spatial plot colored by cluster
p2 <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
ggtitle("Spatial - Clusters")
p_clusters <- p1 + p2
p_clusters
pdf(file.path(processed_path, "07_clusters.pdf"),
width = 14, height = 6)
print(p_clusters)
dev.off()
pdf(file.path(processed_path, "08_spatial_clusters_faceted.pdf"),
width = 10, height = 10)
plots <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1,
split.by = "seurat_clusters")
print(plots)
dev.off()
# Find markers for all clusters
xenium.markers <- FindAllMarkers(
xenium.region,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25
)
# Save marker table
write.csv(xenium.markers,
file.path(processed_path, "cluster_markers_all.csv"),
row.names = FALSE)
# Top markers per cluster
top_markers <- xenium.markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 10) %>%
ungroup()
head(top_markers)
| p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene |
|---|---|---|---|---|---|---|
| 0 | 1.73 | 0.702 | 0.303 | 0 | 0 | SDK2 |
| 0 | 1.01 | 0.919 | 0.750 | 0 | 0 | CA12 |
| 0 | 1.48 | 0.864 | 0.703 | 0 | 0 | TFF3 |
| 0 | 1.47 | 0.867 | 0.728 | 0 | 0 | TFF1 |
| 0 | 1.00 | 0.956 | 0.825 | 0 | 0 | STC2 |
| 9.03e-316 | 1.12 | 0.848 | 0.658 | 1.62e-311 | 0 | ERRFI1 |
# Heatmap
p <- DoHeatmap(
xenium.region,
features = top_markers$gene,
group.colors = cluster_cols
) +
theme(axis.text.y = element_text(size = 8))
pdf(file.path(processed_path, "09_heatmap_top10_markers.pdf"),
width = 12, height = 10)
print(p)
dev.off()
Based on marker gene expression, clusters are assigned to known breast tissue cell types.
xenium.region <- RenameIdents(xenium.region,
"0" = "Luminal epithelial 1",
"1" = "Luminal epithelial 2",
"2" = "T NK Myeloid",
"3" = "Luminal epithelial 3",
"4" = "Vascular",
"5" = "Fibroblast",
"6" = "Basal",
"7" = "Proliferating",
"8" = "Luminal epithelial 4",
"9" = "Plasma B"
)
Idents(xenium.region) = factor(Idents(xenium.region),
levels = c(
"Luminal epithelial 1",
"Luminal epithelial 2",
"Luminal epithelial 3",
"Luminal epithelial 4",
"Basal",
"Proliferating",
"Fibroblast",
"Vascular",
"T NK Myeloid",
"Plasma B"
) )
xenium.region$cell_type <- Idents(xenium.region)
table(Idents(xenium.region))
| Cell Type | Count |
|---|---|
| Luminal epithelial 1 | 4728 |
| Luminal epithelial 2 | 1214 |
| Luminal epithelial 3 | 934 |
| Luminal epithelial 4 | 559 |
| Basal | 726 |
| Proliferating | 583 |
| Fibroblast | 791 |
| Vascular | 802 |
| T NK Myeloid | 1079 |
| Plasma B | 42 |
# Update cluster_cols
names(cluster_cols) <- c(
"Luminal epithelial 1",
"Luminal epithelial 2",
"T NK Myeloid",
"Luminal epithelial 3",
"Vascular",
"Fibroblast",
"Basal",
"Proliferating",
"Luminal epithelial 4",
"Plasma B"
)
p <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
ggtitle("Spatial - Cell Type")
pdf(file.path(processed_path, "10_CellType.pdf"),
width = 6, height = 6)
print(p)
dev.off()
pie_df <- as.data.frame(table(xenium.region$cell_type)) %>%
rename(cell_type = Var1, n = Freq) %>%
mutate(
cell_type = factor(cell_type, levels = levels(xenium.region$cell_type)),
pct = n / sum(n) * 100,
label = paste0(cell_type, "\n", n, " (", round(pct, 1), "%)")
)
pie_df <- pie_df %>% arrange(cell_type)
p_donut <- ggplot(pie_df, aes(x = 2, y = n, fill = cell_type)) +
geom_bar(stat = "identity", width = 1, color = "white", linewidth = 0.5) +
coord_polar(theta = "y", start = 0) +
scale_fill_manual(values = cluster_cols, name = "Cell type") +
geom_text(aes(label = paste0(round(pct, 1), "%")),
position = position_stack(vjust = 0.5),
size = 3.5, color = "black", fontface = "bold") +
xlim(0.5, 2.5) +
labs(title = paste0("Cell type composition (n = ", sum(pie_df$n), ")")) +
theme_void(base_size = 12) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "right"
)
pdf(file.path(processed_path, "celltype_piechart.pdf"),
width = 8, height = 6)
print(p_donut)
dev.off()
Visualize expression of known breast cancer cell type markers in spatial context.
| Gene | Cell Type Marker For |
|---|---|
| AGR3 | Luminal epithelial cells |
| VWF | Vascular endothelial cells |
| TRAC | T cells (canonical T-cell marker) |
# AGR3 is a marker of luminal epithelial cells
# VWF labels vascular endothelial cells
# TRAC is a canonical T-cell marker
pdf(file.path(processed_path, "11_spatial_feature.pdf"),
width = 12, height = 10)
ImageFeaturePlot(xenium.region,
fov = "fov",
features = c("AGR3", "VWF", "TRAC"),
size = 0.75)
dev.off()
# polygon_plot_with_transcripts
# Draws cell-segmentation polygons colored by cluster_cols, with selected
# transcripts (AGR3, VWF, TRAC) overlaid only inside their target cell types.
source("Plot_function.R")
plot_xenium_polygons_with_transcripts(
obj = xenium.region,
color_fill = cluster_cols,
idents_col = "cell_type",
gene_plot = c("AGR3", "VWF", "TRAC"),
gene_plot_col = c("AGR3" = "#0033FF", "VWF" = "#FF0000", "TRAC" = "#040404"),
cell_style = "outline",
pdf_prefix = "12_spatial_polygon_AGR3_VWF_TRAC",
output_dir = processed_path
)
| Receptor | Main Ligands |
|---|---|
| FLT1 (VEGFR1) | VEGFA, VEGFB, PlGF |
| KDR (VEGFR2) | VEGFA (primary), VEGFC (mature/cleaved), VEGFD (mature/cleaved) |
| NRP1 | VEGFA (especially VEGFA165 isoform) |
- VEGFA is the dominant ligand for all three (FLT1, KDR, NRP1).
- FLT1 (VEGFR1) binds VEGFA with very high affinity but often acts as a decoy receptor, limiting signaling.
- KDR (VEGFR2) is the main signaling receptor responsible for angiogenesis.
- NRP1 is a co-receptor, not a signaling receptor by itself; it enhances VEGFA signaling through KDR.
pdf(file.path(processed_path, "VEGFA_receptors_VlnPlot_by_cell_type.pdf"),
width = 15, height = 6)
p <- VlnPlot(
xenium.region,
features = c("VEGFA", "FLT1", "KDR", "NRP1"),
group.by = "cell_type",
cols = cluster_cols,
pt.size = 0,
ncol = length(genes_use)
) &
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
legend.position = "none"
)
print(p)
dev.off()
sessionInfo()