6. Whole Transcriptome In Situ

Overview

The Atera In Situ is a recently released spatial transcriptomics platform from 10x Genomics. Atera captures whole-transcriptome expression (>18,000 genes) directly within tissue sections at single-cell resolution.

Technology Key Features

Whole Transcriptome Coverage

Profiles >18,000 genes covering almost the entire protein-coding transcriptome
Available as Atera WTA (whole transcriptome) or Atera Select (targeted panels)
Customizable with up to 1,000 additional custom targets via stackable panels

High-Throughput Tissue Analysis

Imageable area of >500 mm² per slide, supporting up to 4 slides per run
Capable of analyzing up to 1,600 mm² of tissue per week

Applications

Whole-transcriptome spatially resolved profiling at single-cell resolution
Large-scale discovery studies in cancer, neuroscience, and developmental biology
Unbiased cell type mapping and tumor microenvironment characterization
High-plex targeted spatial analysis with Atera Select panels

📚 Resources

Platform: [10x Genomics Atera]

Dataset: [Atera In Situ Gene Expression, FFPE Human Breast Cancer]

Workshop Introduction

This workshop covers the analysis of 10x Genomics Xenium In Situ whole transcriptome (WTA) data using Seurat v5 in R. We will analyze a human breast cancer FFPE tissue section profiled with the Xenium WTA panel (~18,000 gene targets, ~170,000 cells).

Learning Objectives

Load and explore Xenium spatial transcriptomics data
Perform quality control and filtering
Normalize, cluster, and visualize cells
Visualize gene expression in spatial context
Annotate cell types using marker genes
Perform spatial niche analysis
Explore cell-cell spatial relationships

Experiment Summary

💡 More information can be found in analysis_summary.html.

Part 1: Setup and Data Loading

Load Libraries

library(Seurat)       
library(ggplot2)
library(patchwork)
library(dplyr)
library(purrr)
library(FNN)
library(future)

packageVersion("Seurat")# v5.4.0

# Use sequential plan to avoid globals size issues with large Xenium objects
plan("sequential")
options(future.globals.maxSize = 50 * 1024^3)  # 50 GB

Set Paths

# Path to raw Xenium data
# (Change this to your local Xenium data directory)
RawData_path <- "/Xenium_FullTranscript/RawData/Xenium_FT" 
# Path to processed output files
processed_path <- "/Xenium_FullTranscript/ProcessedData"

Load Xenium Data

The LoadXenium() function reads in the Xenium output directory and creates a Seurat object with spatial coordinates and cell segmentation boundaries.

Option 1: Load from raw Xenium output

# Path to Xenium output (Change this to your local Xenium data directory)
xenium.obj <- LoadXenium(
  RawData_path,
  fov           = "fov",
  segmentations = "cell"      # ← KEY: loads polygon boundaries
)

# Save
saveRDS(
  xenium.obj,
  file = file.path(processed_path,"xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)

Option 2: Load pre-computed object

xenium.obj <- readRDS(
  file.path(processed_path, "xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)
# Quick look at the object
xenium.obj

Inspect the Object

An object of class Seurat
27104 features across 170057 samples within 5 assays
Active assay: Xenium (18028 features, 0 variable features)
1 layer present: counts
4 other assays present: BlankCodeword, ControlCodeword, ControlProbe, GenomicControl
1 spatial field of view present: fov

head(xenium.obj,5)

Cell ID	orig.ident	nCount_Xenium	nFeature_Xenium	segmentation_method	nCount_BlankCodeword	nFeature_BlankCodeword	nCount_ControlCodeword	nFeature_ControlCodeword	nCount_GenomicControl	nFeature_GenomicControl
aaaajgij-1	SeuratProject	5508	3428	Segmented by boundary stain (ATP1A1+CD45+E-Cadherin)	0	0	1	1	4	3
aaaandia-1	SeuratProject	6522	3753	Segmented by boundary stain (ATP1A1+CD45+E-Cadherin)	0	0	0	0	5	5
aaabalki-1	SeuratProject	1622	1385	Segmented by boundary stain (ATP1A1+CD45+E-Cadherin)	1	1	0	0	5	5
aaabchln-1	SeuratProject	812	745	Segmented by boundary stain (ATP1A1+CD45+E-Cadherin)	0	0	0	0	1	1
aaaccnnp-1	SeuratProject	2974	2061	Segmented by boundary stain (ATP1A1+CD45+E-Cadherin)	0	0	0	0	3	3

Check Cell x/y Coordinates

coords <- xenium.obj@images$fov@boundaries$centroids@coords
coords_df <- as.data.frame(coords)
coords_df$cell_id <- Cells(xenium.obj@images$fov@boundaries$centroids)

head(coords_df)

cell_id	x	y
aaaajgij-1	74.00697	2732.648
aaaandia-1	152.94557	2721.638
aaabalki-1	15.98353	2817.409
aaabchln-1	15.05376	2799.583
aaaccnnp-1	59.01640	2763.473
aaacdfgp-1	62.20883	2758.241

Part 2: Quality Control

QC Metrics

Xenium data QC focuses on:

nCount_Xenium — Total transcripts per cell
nFeature_Xenium — Number of unique genes detected per cell

# Violin plots of QC metrics
pdf(file.path(processed_path, "01_qc_violin.pdf"), width = 6, height = 4)

p_qc <- VlnPlot(
  xenium.obj,
  features = c("nFeature_Xenium", "nCount_Xenium"),
  ncol = 2,
  pt.size = 0
) &
  theme(axis.text.x = element_blank())

print(p_qc)

dev.off()

QC Summary Statistics

qc_stats <- data.frame(
  Metric = c("Median transcripts/cell", "Median genes/cell",
             "Mean transcripts/cell", "Mean genes/cell",
             "Min transcripts/cell", "Max transcripts/cell"),
  Value = c(
    median(xenium.obj$nCount_Xenium),
    median(xenium.obj$nFeature_Xenium),
    round(mean(xenium.obj$nCount_Xenium), 1),
    round(mean(xenium.obj$nFeature_Xenium), 1),
    min(xenium.obj$nCount_Xenium),
    max(xenium.obj$nCount_Xenium)
  )
)
knitr::kable(qc_stats, caption = "QC Summary Statistics")

Metric	Value
Median transcripts / cell	2,116
Median genes / cell	1,543
Mean transcripts / cell	2,948.3
Mean genes / cell	1,805.2
Min transcripts / cell	0
Max transcripts / cell	26,078

Interpretation:

This table summarizes the basic QC metrics for the Xenium dataset. The median number of transcripts per cell is 2,116, with 1,543 detected genes per cell. The mean values (2,948 transcripts and 1,805 genes per cell) are higher than the medians, indicating a right-skewed distribution where a subset of cells has elevated transcript counts.

The minimum transcript count is 0, suggesting that a small number of segmented cells have no detected transcripts. In contrast, the maximum of 26,078 transcripts highlights substantial variability across cells, with some exhibiting much higher transcriptional activity.

Distribution of Transcript Counts

pdf(file.path(processed_path, "02_right_skewed_distribution.pdf"), width = 7, height = 5)

median_count <- median(xenium.obj$nCount_Xenium)
mean_count <- mean(xenium.obj$nCount_Xenium)

max_y <- max(
  hist(xenium.obj$nCount_Xenium, breaks = 80, plot = FALSE)$counts
)

p_right_skew <- ggplot(xenium.obj@meta.data, aes(x = nCount_Xenium)) +
  geom_histogram(bins = 80, fill = "gray85", color = "gray35", linewidth = 0.2) +
  geom_vline(xintercept = median_count, color = "#2C7BB6", linewidth = 1.1) +
  geom_vline(xintercept = mean_count, color = "#D7191C", linewidth = 1.1) +
  annotate("label", x = median_count - 500, y = max_y * 0.95,
           label = paste0("Median = ", round(median_count)),
           color = "#2C7BB6", fill = "white") +
  annotate("label", x = mean_count + 1200, y = max_y * 0.82,
           label = paste0("Mean = ", round(mean_count, 1)),
           color = "#D7191C", fill = "white") +
  labs(
    title = "Distribution of Xenium Transcript Counts per Cell",
    subtitle = "Mean is greater than median, indicating a right-skewed distribution",
    x = "Transcripts per cell (nCount_Xenium)",
    y = "Number of cells"
  ) +
  theme_classic(base_size = 13)

print(p_right_skew)
dev.off()

📌 Note: This plot shows the distribution of transcript counts per cell. The mean is higher than the median, and the long tail extends toward higher transcript counts, indicating a right-skewed distribution. This suggests that most cells have moderate transcript counts, while a smaller subset has much higher transcript abundance.

Spatial QC

pdf(file.path(processed_path, "03_spatial_qc.pdf"), width = 10, height = 5)

p_spatial <- SpatialFeaturePlot(
  xenium.obj,
  features = c("nCount_Xenium", "nFeature_Xenium"),
  ncol = 2
)

print(p_spatial)
dev.off()

Subset a Region

To make visualization and QC faster, we downsample cells from the top-left region of the tissue.

Define and Highlight the Region

# Define region to subset (1500 x 1500)
x_min <- 4750
x_max <- 6250
y_min <- 1250
y_max <- 2750

coords_df$in_region <- coords_df$x >= x_min &
  coords_df$x <= x_max &
  coords_df$y >= y_min &
  coords_df$y <= y_max

# Plot all cells, highlighting selected region in red
pdf(file.path(processed_path, "04_region_highlight.pdf"), width = 6, height = 6)
p_coords_region <- ggplot(coords_df, aes(x = x, y = y)) +
  geom_point(color = "gray70", size = 0.1, alpha = 0.3) +
  geom_point(
    data = subset(coords_df, in_region),
    color = "red",
    size = 0.15,
    alpha = 0.7
  ) +
  coord_fixed() +
  scale_x_continuous(breaks = seq(0, max(coords_df$x), by = 1000)) +
  scale_y_continuous(breaks = seq(0, max(coords_df$y), by = 1000)) +
  labs(
    title = "Selected Spatial Region",
    subtitle = "Highlighted region: x = 4750–6250, y = 1250–2750",
    x = "X coordinate",
    y = "Y coordinate"
  ) +
  theme_classic(base_size = 13) +
  theme(
    panel.grid.major = element_line(color = "gray85", linewidth = 0.3),
    panel.grid.minor = element_blank()
  )
print(p_coords_region)
dev.off()

Subset the Seurat Object

# Subset cells in this region
region_cells <- coords_df$cell_id[coords_df$in_region]

xenium.region <- subset(
  xenium.obj,
  cells = region_cells
)

cat("Number of cells in selected region:", ncol(xenium.region), "\n")

Number of cells in selected region: 11458

Save the Subset Object

# Save the subset region Seurat object
# (Since this step can be time-consuming, we'll skip it during the workshop.)
saveRDS(
  xenium.region,
  file = file.path(processed_path, "xenium_region_x4750_6250_y1250_2750.rds")
)

Visualize Spatial QC on the Subset

pdf(file.path(processed_path, "05_subset_object_spatial_qc.pdf"), width = 10, height = 5)

p_spatial_region <- ImageFeaturePlot(
    xenium.region,
  features = c("nCount_Xenium", "nFeature_Xenium")
)

print(p_spatial_region)

dev.off()

Filtering

Filter cells based on QC metrics to remove low-quality cells, debris, and empty segmentations.

Threshold	Meaning
`nCount_Xenium > 10`	Minimum total transcript counts per cell
`nFeature_Xenium > 5`	Minimum number of detected genes per cell

# Filter cells based on QC metrics
# nCount_Xenium > 10   : minimum total transcript counts per cell
# nFeature_Xenium > 5  : minimum number of detected genes per cell
# Removes low-quality cells, debris, and empty segmentations
cat("Cells before filtering:", ncol(xenium.region), "\n")

xenium.region <- subset(xenium.region,
                                  subset = nCount_Xenium > 10 &
                                           nFeature_Xenium > 5)

cat("Cells after filtering:", ncol(xenium.region ), "\n")

Cells before filtering: 11458

Cells after filtering: 11458

Part 3: Normalization and Clustering

SCTransform and PCA

Xenium Atera is a whole-transcriptome assay (18,028 features). Here we use SCTransform to select the top variable features (default is 3,000) for downstream PCA.

# SCTransform normalization
# Xenium Atera is a whole-transcriptome assay (18,028 features), here we use
# SCTransform to select the top variable features (default is 3,000) for downstream PCA.

xenium.region <- SCTransform(xenium.region, assay = "Xenium")
# PCA 
xenium.region <- RunPCA(xenium.region, npcs = 30)

# Elbow plot to determine optimal number of PCs
pdf(file.path(processed_path, "06_ElbowPlot_xenium_region.pdf"),
    width = 5, height = 5)
ElbowPlot(xenium.region, ndims = 30)
dev.off()

UMAP and Clustering

# UMAP + Clustering
xenium.region <- RunUMAP(xenium.region, dims = 1:30)
xenium.region <- FindNeighbors(xenium.region, reduction = "pca", dims = 1:30)
xenium.region <- FindClusters(xenium.region, resolution = 0.2)

# Check clusters
cat("Number of clusters:", length(levels(Idents(xenium.region))), "\n")
table(Idents(xenium.region))

Result: Number of clusters: 10

Cluster	Cell Count
0	4,728
1	1,214
2	1,079
3	934
4	802
5	791
6	726
7	583
8	559
9	42

Visualize Clusters

UMAP and Spatial View

# Define shared cluster colors

n_clusters =length(levels(Idents(xenium.region)))


cluster_cols <- setNames(
  c("#33FF00","#FFFF33","#F100F1","#085AFF","#E41A1C",
    "#FF7F00", "#00FFFF","#990099","#EEAAEE","#F6E0D2",
    "#614128", "#E7C76F","#9467BD","#AECF00","#5DCEAF")[1:n_clusters],
  levels(Idents(xenium.region))
)



# UMAP colored by cluster
p1 <- DimPlot(xenium.region, reduction = "umap", label = TRUE, repel = TRUE,
              cols = cluster_cols) +
  ggtitle("UMAP - Clusters") + NoLegend()

# Spatial plot colored by cluster
p2 <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
  ggtitle("Spatial - Clusters")

p_clusters <- p1 + p2
p_clusters

pdf(file.path(processed_path, "07_clusters.pdf"),
    width = 14, height = 6)
print(p_clusters)
dev.off()

Faceted Spatial View

pdf(file.path(processed_path, "08_spatial_clusters_faceted.pdf"),
    width = 10, height = 10)
plots <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1,
                  split.by = "seurat_clusters")
print(plots)
dev.off()

Part 4: Annotation and Spatial Visualization

Find Cluster Markers

# Find markers for all clusters
xenium.markers <- FindAllMarkers(
  xenium.region,
  only.pos  = TRUE,
  min.pct   = 0.25,
  logfc.threshold = 0.25
)


# Save marker table
write.csv(xenium.markers,
          file.path(processed_path, "cluster_markers_all.csv"),
          row.names = FALSE)

# Top markers per cluster
top_markers <- xenium.markers %>%
  group_by(cluster) %>%
  dplyr::filter(avg_log2FC > 1) %>%
  slice_head(n = 10) %>%
  ungroup()


head(top_markers)

p_val	avg_log2FC	pct.1	pct.2	p_val_adj	gene
0	1.73	0.702	0.303	0	SDK2
0	1.01	0.919	0.750	0	CA12
0	1.48	0.864	0.703	0	TFF3
0	1.47	0.867	0.728	0	TFF1
0	1.00	0.956	0.825	0	STC2
9.03e-316	1.12	0.848	0.658	1.62e-311	ERRFI1

Marker Heatmap

# Heatmap
p <- DoHeatmap(
  xenium.region,
  features = top_markers$gene,
  group.colors = cluster_cols
) +
  theme(axis.text.y = element_text(size = 8))

pdf(file.path(processed_path, "09_heatmap_top10_markers.pdf"),
    width = 12, height = 10)
print(p)
dev.off()

Cell Type Annotation

Based on marker gene expression, clusters are assigned to known breast tissue cell types.


xenium.region <- RenameIdents(xenium.region,
                              "0" = "Luminal epithelial 1",
                              "1" = "Luminal epithelial 2",
                              "2" = "T NK Myeloid",
                              "3" = "Luminal epithelial 3",
                              "4" = "Vascular",
                              "5" = "Fibroblast",
                              "6" = "Basal",
                              "7" = "Proliferating",
                              "8" = "Luminal epithelial 4",
                              "9" = "Plasma B"
)
Idents(xenium.region) = factor(Idents(xenium.region),
                               levels = c(
                                 "Luminal epithelial 1",
                                 "Luminal epithelial 2",
                                 "Luminal epithelial 3",
                                 "Luminal epithelial 4",
                                 "Basal",
                                 "Proliferating",
                                 "Fibroblast",
                                 "Vascular",
                                 "T NK Myeloid",
                                 "Plasma B"
                               ) )
xenium.region$cell_type <- Idents(xenium.region)
table(Idents(xenium.region))

Cell Type	Count
Luminal epithelial 1	4728
Luminal epithelial 2	1214
Luminal epithelial 3	934
Luminal epithelial 4	559
Basal	726
Proliferating	583
Fibroblast	791
Vascular	802
T NK Myeloid	1079
Plasma B	42

Annotated Spatial Map

# Update cluster_cols
names(cluster_cols) <- c(
  "Luminal epithelial 1",
  "Luminal epithelial 2",
  "T NK Myeloid",
  "Luminal epithelial 3",
  "Vascular",
  "Fibroblast",
  "Basal",
  "Proliferating",
  "Luminal epithelial 4",
  "Plasma B"
)

p <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
  ggtitle("Spatial - Cell Type")


pdf(file.path(processed_path, "10_CellType.pdf"),
    width = 6, height = 6)
print(p)
dev.off()

Check cell type composition

pie_df <- as.data.frame(table(xenium.region$cell_type)) %>%
  rename(cell_type = Var1, n = Freq) %>%
  mutate(
    cell_type = factor(cell_type, levels = levels(xenium.region$cell_type)),
    pct       = n / sum(n) * 100,
    label     = paste0(cell_type, "\n", n, " (", round(pct, 1), "%)")
  )


pie_df <- pie_df %>% arrange(cell_type)
p_donut <- ggplot(pie_df, aes(x = 2, y = n, fill = cell_type)) +
  geom_bar(stat = "identity", width = 1, color = "white", linewidth = 0.5) +
  coord_polar(theta = "y", start = 0) +
  scale_fill_manual(values = cluster_cols, name = "Cell type") +
  geom_text(aes(label = paste0(round(pct, 1), "%")),
            position = position_stack(vjust = 0.5),
            size = 3.5, color = "black", fontface = "bold") +
  xlim(0.5, 2.5) +
  labs(title = paste0("Cell type composition (n = ", sum(pie_df$n), ")")) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )


pdf(file.path(processed_path, "celltype_piechart.pdf"),
    width = 8, height = 6)
print(p_donut)
dev.off()

Marker Gene Expression

Visualize expression of known breast cancer cell type markers in spatial context.

Gene	Cell Type Marker For
AGR3	Luminal epithelial cells
VWF	Vascular endothelial cells
TRAC	T cells (canonical T-cell marker)

# AGR3 is a marker of luminal epithelial cells
# VWF labels vascular endothelial cells 
# TRAC is a canonical T-cell marker

pdf(file.path(processed_path, "11_spatial_feature.pdf"),
    width = 12, height = 10)
ImageFeaturePlot(xenium.region,
                 fov      = "fov",
                 features = c("AGR3", "VWF", "TRAC"), 
                 size     = 0.75)
dev.off()

# polygon_plot_with_transcripts 
# Draws cell-segmentation polygons colored by cluster_cols, with selected
# transcripts (AGR3, VWF, TRAC) overlaid only inside their target cell types.

source("Plot_function.R")

plot_xenium_polygons_with_transcripts(
  obj = xenium.region,
  color_fill = cluster_cols,
  idents_col = "cell_type",
  gene_plot = c("AGR3", "VWF", "TRAC"),
  gene_plot_col = c("AGR3" = "#0033FF", "VWF" = "#FF0000", "TRAC" = "#040404"),
  cell_style = "outline",
  pdf_prefix = "12_spatial_polygon_AGR3_VWF_TRAC",
  output_dir = processed_path
)

Ligand–Receptor Axis in Angiogenesis

Receptor	Main Ligands
FLT1 (VEGFR1)	VEGFA, VEGFB, PlGF
KDR (VEGFR2)	VEGFA (primary), VEGFC (mature/cleaved), VEGFD (mature/cleaved)
NRP1	VEGFA (especially VEGFA165 isoform)

VEGFA is the dominant ligand for all three (FLT1, KDR, NRP1).
FLT1 (VEGFR1) binds VEGFA with very high affinity but often acts as a decoy receptor, limiting signaling.
KDR (VEGFR2) is the main signaling receptor responsible for angiogenesis.
NRP1 is a co-receptor, not a signaling receptor by itself; it enhances VEGFA signaling through KDR.


pdf(file.path(processed_path, "VEGFA_receptors_VlnPlot_by_cell_type.pdf"),
    width = 15, height = 6)

p <- VlnPlot(
  xenium.region,
  features = c("VEGFA", "FLT1", "KDR", "NRP1"),
  group.by = "cell_type",
  cols = cluster_cols,
  pt.size = 0,
  ncol = length(genes_use)
) &
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    legend.position = "none"
  )
print(p)

dev.off()

Spatial proximity of luminal subtypes to the TME

04272026

Session Info

sessionInfo()

6. Whole Transcriptome In Situ

Overview

Technology Key Features

Applications

Workshop Introduction

Learning Objectives

Experiment Summary

Part 1: Setup and Data Loading

Load Libraries

Set Paths

Load Xenium Data

Inspect the Object

Check Cell x/y Coordinates

Part 2: Quality Control

QC Metrics

QC Summary Statistics

Distribution of Transcript Counts

Spatial QC

Subset a Region

Define and Highlight the Region

Subset the Seurat Object

Save the Subset Object

Visualize Spatial QC on the Subset

Filtering

Part 3: Normalization and Clustering

SCTransform and PCA

UMAP and Clustering

Visualize Clusters

UMAP and Spatial View

Faceted Spatial View

Part 4: Annotation and Spatial Visualization

Find Cluster Markers

Marker Heatmap

Cell Type Annotation

Annotated Spatial Map

Check cell type composition

Marker Gene Expression

Ligand–Receptor Axis in Angiogenesis

Spatial proximity of luminal subtypes to the TME

04272026

Session Info

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally