Skip to content

6. Whole Transcriptome In Situ

Pingxu0101 edited this page Apr 29, 2026 · 26 revisions

Overview

The Atera In Situ is a recently released spatial transcriptomics platform from 10x Genomics. Atera captures whole-transcriptome expression (>18,000 genes) directly within tissue sections at single-cell resolution.

image

Technology Key Features

Whole Transcriptome Coverage

  • Profiles >18,000 genes covering almost the entire protein-coding transcriptome
  • Available as Atera WTA (whole transcriptome) or Atera Select (targeted panels)
  • Customizable with up to 1,000 additional custom targets via stackable panels

High-Throughput Tissue Analysis

  • Imageable area of >500 mm² per slide, supporting up to 4 slides per run
  • Capable of analyzing up to 1,600 mm² of tissue per week

Applications

  • Whole-transcriptome spatially resolved profiling at single-cell resolution
  • Large-scale discovery studies in cancer, neuroscience, and developmental biology
  • Unbiased cell type mapping and tumor microenvironment characterization
  • High-plex targeted spatial analysis with Atera Select panels

📚 Resources


Workshop Introduction

This workshop covers the analysis of 10x Genomics Xenium In Situ whole transcriptome (WTA) data using Seurat v5 in R. We will analyze a human breast cancer FFPE tissue section profiled with the Xenium WTA panel (~18,000 gene targets, ~170,000 cells).

Learning Objectives

  1. Load and explore Xenium spatial transcriptomics data
  2. Perform quality control and filtering
  3. Normalize, cluster, and visualize cells
  4. Visualize gene expression in spatial context
  5. Annotate cell types using marker genes
  6. Perform spatial niche analysis
  7. Explore cell-cell spatial relationships

Experiment Summary

image

💡 More information can be found in analysis_summary.html.


Part 1: Setup and Data Loading

Load Libraries

library(Seurat)       
library(ggplot2)
library(patchwork)
library(dplyr)
library(purrr)
library(FNN)
library(future)

packageVersion("Seurat")# v5.4.0

# Use sequential plan to avoid globals size issues with large Xenium objects
plan("sequential")
options(future.globals.maxSize = 50 * 1024^3)  # 50 GB

Set Paths

# Path to raw Xenium data
# (Change this to your local Xenium data directory)
RawData_path <- "/Xenium_FullTranscript/RawData/Xenium_FT" 
# Path to processed output files
processed_path <- "/Xenium_FullTranscript/ProcessedData"

Load Xenium Data

The LoadXenium() function reads in the Xenium output directory and creates a Seurat object with spatial coordinates and cell segmentation boundaries.

Option 1: Load from raw Xenium output

# Path to Xenium output (Change this to your local Xenium data directory)
xenium.obj <- LoadXenium(
  RawData_path,
  fov           = "fov",
  segmentations = "cell"      # ← KEY: loads polygon boundaries
)

# Save
saveRDS(
  xenium.obj,
  file = file.path(processed_path,"xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)

Option 2: Load pre-computed object

xenium.obj <- readRDS(
  file.path(processed_path, "xenium_breast_cancer_WTA_raw_170057cells_with_segmentations.rds")
)
# Quick look at the object
xenium.obj

Inspect the Object

An object of class Seurat
27104 features across 170057 samples within 5 assays
Active assay: Xenium (18028 features, 0 variable features)
1 layer present: counts
4 other assays present: BlankCodeword, ControlCodeword, ControlProbe, GenomicControl
1 spatial field of view present: fov

head(xenium.obj,5)
Cell ID orig.ident nCount_Xenium nFeature_Xenium segmentation_method nCount_BlankCodeword nFeature_BlankCodeword nCount_ControlCodeword nFeature_ControlCodeword nCount_ControlProbe nFeature_ControlProbe nCount_GenomicControl nFeature_GenomicControl
aaaajgij-1 SeuratProject 5508 3428 Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) 0 0 1 1 0 0 4 3
aaaandia-1 SeuratProject 6522 3753 Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) 0 0 0 0 0 0 5 5
aaabalki-1 SeuratProject 1622 1385 Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) 1 1 0 0 0 0 5 5
aaabchln-1 SeuratProject 812 745 Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) 0 0 0 0 0 0 1 1
aaaccnnp-1 SeuratProject 2974 2061 Segmented by boundary stain (ATP1A1+CD45+E-Cadherin) 0 0 0 0 0 0 3 3

Check Cell x/y Coordinates

coords <- xenium.obj@images$fov@boundaries$centroids@coords
coords_df <- as.data.frame(coords)
coords_df$cell_id <- Cells(xenium.obj@images$fov@boundaries$centroids)

head(coords_df)
cell_id x y
aaaajgij-1 74.00697 2732.648
aaaandia-1 152.94557 2721.638
aaabalki-1 15.98353 2817.409
aaabchln-1 15.05376 2799.583
aaaccnnp-1 59.01640 2763.473
aaacdfgp-1 62.20883 2758.241

Part 2: Quality Control

QC Metrics

Xenium data QC focuses on:

  • nCount_Xenium — Total transcripts per cell
  • nFeature_Xenium — Number of unique genes detected per cell
# Violin plots of QC metrics
pdf(file.path(processed_path, "01_qc_violin.pdf"), width = 6, height = 4)

p_qc <- VlnPlot(
  xenium.obj,
  features = c("nFeature_Xenium", "nCount_Xenium"),
  ncol = 2,
  pt.size = 0
) &
  theme(axis.text.x = element_blank())

print(p_qc)

dev.off()

image

QC Summary Statistics

qc_stats <- data.frame(
  Metric = c("Median transcripts/cell", "Median genes/cell",
             "Mean transcripts/cell", "Mean genes/cell",
             "Min transcripts/cell", "Max transcripts/cell"),
  Value = c(
    median(xenium.obj$nCount_Xenium),
    median(xenium.obj$nFeature_Xenium),
    round(mean(xenium.obj$nCount_Xenium), 1),
    round(mean(xenium.obj$nFeature_Xenium), 1),
    min(xenium.obj$nCount_Xenium),
    max(xenium.obj$nCount_Xenium)
  )
)
knitr::kable(qc_stats, caption = "QC Summary Statistics")
Metric Value
Median transcripts / cell 2,116
Median genes / cell 1,543
Mean transcripts / cell 2,948.3
Mean genes / cell 1,805.2
Min transcripts / cell 0
Max transcripts / cell 26,078

Interpretation:

This table summarizes the basic QC metrics for the Xenium dataset. The median number of transcripts per cell is 2,116, with 1,543 detected genes per cell. The mean values (2,948 transcripts and 1,805 genes per cell) are higher than the medians, indicating a right-skewed distribution where a subset of cells has elevated transcript counts.

The minimum transcript count is 0, suggesting that a small number of segmented cells have no detected transcripts. In contrast, the maximum of 26,078 transcripts highlights substantial variability across cells, with some exhibiting much higher transcriptional activity.

Distribution of Transcript Counts

pdf(file.path(processed_path, "02_right_skewed_distribution.pdf"), width = 7, height = 5)

median_count <- median(xenium.obj$nCount_Xenium)
mean_count <- mean(xenium.obj$nCount_Xenium)

max_y <- max(
  hist(xenium.obj$nCount_Xenium, breaks = 80, plot = FALSE)$counts
)

p_right_skew <- ggplot(xenium.obj@meta.data, aes(x = nCount_Xenium)) +
  geom_histogram(bins = 80, fill = "gray85", color = "gray35", linewidth = 0.2) +
  geom_vline(xintercept = median_count, color = "#2C7BB6", linewidth = 1.1) +
  geom_vline(xintercept = mean_count, color = "#D7191C", linewidth = 1.1) +
  annotate("label", x = median_count - 500, y = max_y * 0.95,
           label = paste0("Median = ", round(median_count)),
           color = "#2C7BB6", fill = "white") +
  annotate("label", x = mean_count + 1200, y = max_y * 0.82,
           label = paste0("Mean = ", round(mean_count, 1)),
           color = "#D7191C", fill = "white") +
  labs(
    title = "Distribution of Xenium Transcript Counts per Cell",
    subtitle = "Mean is greater than median, indicating a right-skewed distribution",
    x = "Transcripts per cell (nCount_Xenium)",
    y = "Number of cells"
  ) +
  theme_classic(base_size = 13)

print(p_right_skew)
dev.off()

image

📌 Note: This plot shows the distribution of transcript counts per cell. The mean is higher than the median, and the long tail extends toward higher transcript counts, indicating a right-skewed distribution. This suggests that most cells have moderate transcript counts, while a smaller subset has much higher transcript abundance.

Spatial QC

pdf(file.path(processed_path, "03_spatial_qc.pdf"), width = 10, height = 5)

p_spatial <- SpatialFeaturePlot(
  xenium.obj,
  features = c("nCount_Xenium", "nFeature_Xenium"),
  ncol = 2
)

print(p_spatial)
dev.off()

image

Subset a Region

To make visualization and QC faster, we downsample cells from the top-left region of the tissue.

Define and Highlight the Region

# Define region to subset (1500 x 1500)
x_min <- 4750
x_max <- 6250
y_min <- 1250
y_max <- 2750

coords_df$in_region <- coords_df$x >= x_min &
  coords_df$x <= x_max &
  coords_df$y >= y_min &
  coords_df$y <= y_max

# Plot all cells, highlighting selected region in red
pdf(file.path(processed_path, "04_region_highlight.pdf"), width = 6, height = 6)
p_coords_region <- ggplot(coords_df, aes(x = x, y = y)) +
  geom_point(color = "gray70", size = 0.1, alpha = 0.3) +
  geom_point(
    data = subset(coords_df, in_region),
    color = "red",
    size = 0.15,
    alpha = 0.7
  ) +
  coord_fixed() +
  scale_x_continuous(breaks = seq(0, max(coords_df$x), by = 1000)) +
  scale_y_continuous(breaks = seq(0, max(coords_df$y), by = 1000)) +
  labs(
    title = "Selected Spatial Region",
    subtitle = "Highlighted region: x = 4750–6250, y = 1250–2750",
    x = "X coordinate",
    y = "Y coordinate"
  ) +
  theme_classic(base_size = 13) +
  theme(
    panel.grid.major = element_line(color = "gray85", linewidth = 0.3),
    panel.grid.minor = element_blank()
  )
print(p_coords_region)
dev.off()

image

Subset the Seurat Object

# Subset cells in this region
region_cells <- coords_df$cell_id[coords_df$in_region]

xenium.region <- subset(
  xenium.obj,
  cells = region_cells
)

cat("Number of cells in selected region:", ncol(xenium.region), "\n")

Number of cells in selected region: 11458

Save the Subset Object

# Save the subset region Seurat object
# (Since this step can be time-consuming, we'll skip it during the workshop.)
saveRDS(
  xenium.region,
  file = file.path(processed_path, "xenium_region_x4750_6250_y1250_2750.rds")
)

Visualize Spatial QC on the Subset

pdf(file.path(processed_path, "05_subset_object_spatial_qc.pdf"), width = 10, height = 5)

p_spatial_region <- ImageFeaturePlot(
    xenium.region,
  features = c("nCount_Xenium", "nFeature_Xenium")
)

print(p_spatial_region)

dev.off()

image

Filtering

Filter cells based on QC metrics to remove low-quality cells, debris, and empty segmentations.

Threshold Meaning
nCount_Xenium > 10 Minimum total transcript counts per cell
nFeature_Xenium > 5 Minimum number of detected genes per cell
# Filter cells based on QC metrics
# nCount_Xenium > 10   : minimum total transcript counts per cell
# nFeature_Xenium > 5  : minimum number of detected genes per cell
# Removes low-quality cells, debris, and empty segmentations
cat("Cells before filtering:", ncol(xenium.region), "\n")

xenium.region <- subset(xenium.region,
                                  subset = nCount_Xenium > 10 &
                                           nFeature_Xenium > 5)

cat("Cells after filtering:", ncol(xenium.region ), "\n")

Cells before filtering: 11458

Cells after filtering: 11458


Part 3: Normalization and Clustering

SCTransform and PCA

Xenium Atera is a whole-transcriptome assay (18,028 features). Here we use SCTransform to select the top variable features (default is 3,000) for downstream PCA.

# SCTransform normalization
# Xenium Atera is a whole-transcriptome assay (18,028 features), here we use
# SCTransform to select the top variable features (default is 3,000) for downstream PCA.

xenium.region <- SCTransform(xenium.region, assay = "Xenium")
# PCA 
xenium.region <- RunPCA(xenium.region, npcs = 30)

# Elbow plot to determine optimal number of PCs
pdf(file.path(processed_path, "06_ElbowPlot_xenium_region.pdf"),
    width = 5, height = 5)
ElbowPlot(xenium.region, ndims = 30)
dev.off()

image

UMAP and Clustering

# UMAP + Clustering
xenium.region <- RunUMAP(xenium.region, dims = 1:30)
xenium.region <- FindNeighbors(xenium.region, reduction = "pca", dims = 1:30)
xenium.region <- FindClusters(xenium.region, resolution = 0.2)

# Check clusters
cat("Number of clusters:", length(levels(Idents(xenium.region))), "\n")
table(Idents(xenium.region))

Result: Number of clusters: 10

Cluster Cell Count
0 4,728
1 1,214
2 1,079
3 934
4 802
5 791
6 726
7 583
8 559
9 42

Visualize Clusters

UMAP and Spatial View

# Define shared cluster colors

n_clusters =length(levels(Idents(xenium.region)))


cluster_cols <- setNames(
  c("#33FF00","#FFFF33","#F100F1","#085AFF","#E41A1C",
    "#FF7F00", "#00FFFF","#990099","#EEAAEE","#F6E0D2",
    "#614128", "#E7C76F","#9467BD","#AECF00","#5DCEAF")[1:n_clusters],
  levels(Idents(xenium.region))
)



# UMAP colored by cluster
p1 <- DimPlot(xenium.region, reduction = "umap", label = TRUE, repel = TRUE,
              cols = cluster_cols) +
  ggtitle("UMAP - Clusters") + NoLegend()

# Spatial plot colored by cluster
p2 <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
  ggtitle("Spatial - Clusters")

p_clusters <- p1 + p2
p_clusters

pdf(file.path(processed_path, "07_clusters.pdf"),
    width = 14, height = 6)
print(p_clusters)
dev.off()

image

Faceted Spatial View

pdf(file.path(processed_path, "08_spatial_clusters_faceted.pdf"),
    width = 10, height = 10)
plots <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1,
                  split.by = "seurat_clusters")
print(plots)
dev.off()

image


Part 4: Annotation and Spatial Visualization

Find Cluster Markers

# Find markers for all clusters
xenium.markers <- FindAllMarkers(
  xenium.region,
  only.pos  = TRUE,
  min.pct   = 0.25,
  logfc.threshold = 0.25
)


# Save marker table
write.csv(xenium.markers,
          file.path(processed_path, "cluster_markers_all.csv"),
          row.names = FALSE)

# Top markers per cluster
top_markers <- xenium.markers %>%
  group_by(cluster) %>%
  dplyr::filter(avg_log2FC > 1) %>%
  slice_head(n = 10) %>%
  ungroup()


head(top_markers)
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
0 1.73 0.702 0.303 0 0 SDK2
0 1.01 0.919 0.750 0 0 CA12
0 1.48 0.864 0.703 0 0 TFF3
0 1.47 0.867 0.728 0 0 TFF1
0 1.00 0.956 0.825 0 0 STC2
9.03e-316 1.12 0.848 0.658 1.62e-311 0 ERRFI1

Marker Heatmap

# Heatmap
p <- DoHeatmap(
  xenium.region,
  features = top_markers$gene,
  group.colors = cluster_cols
) +
  theme(axis.text.y = element_text(size = 8))

pdf(file.path(processed_path, "09_heatmap_top10_markers.pdf"),
    width = 12, height = 10)
print(p)
dev.off()

image

Cell Type Annotation

Based on marker gene expression, clusters are assigned to known breast tissue cell types.


xenium.region <- RenameIdents(xenium.region,
                              "0" = "Luminal epithelial 1",
                              "1" = "Luminal epithelial 2",
                              "2" = "T NK Myeloid",
                              "3" = "Luminal epithelial 3",
                              "4" = "Vascular",
                              "5" = "Fibroblast",
                              "6" = "Basal",
                              "7" = "Proliferating",
                              "8" = "Luminal epithelial 4",
                              "9" = "Plasma B"
)
Idents(xenium.region) = factor(Idents(xenium.region),
                               levels = c(
                                 "Luminal epithelial 1",
                                 "Luminal epithelial 2",
                                 "Luminal epithelial 3",
                                 "Luminal epithelial 4",
                                 "Basal",
                                 "Proliferating",
                                 "Fibroblast",
                                 "Vascular",
                                 "T NK Myeloid",
                                 "Plasma B"
                               ) )
xenium.region$cell_type <- Idents(xenium.region)
table(Idents(xenium.region))

Cell Type Count
Luminal epithelial 1 4728
Luminal epithelial 2 1214
Luminal epithelial 3 934
Luminal epithelial 4 559
Basal 726
Proliferating 583
Fibroblast 791
Vascular 802
T NK Myeloid 1079
Plasma B 42

Annotated Spatial Map

# Update cluster_cols
names(cluster_cols) <- c(
  "Luminal epithelial 1",
  "Luminal epithelial 2",
  "T NK Myeloid",
  "Luminal epithelial 3",
  "Vascular",
  "Fibroblast",
  "Basal",
  "Proliferating",
  "Luminal epithelial 4",
  "Plasma B"
)

p <- ImageDimPlot(xenium.region, cols = cluster_cols, size = 1) +
  ggtitle("Spatial - Cell Type")


pdf(file.path(processed_path, "10_CellType.pdf"),
    width = 6, height = 6)
print(p)
dev.off()

image

Check cell type composition

pie_df <- as.data.frame(table(xenium.region$cell_type)) %>%
  rename(cell_type = Var1, n = Freq) %>%
  mutate(
    cell_type = factor(cell_type, levels = levels(xenium.region$cell_type)),
    pct       = n / sum(n) * 100,
    label     = paste0(cell_type, "\n", n, " (", round(pct, 1), "%)")
  )


pie_df <- pie_df %>% arrange(cell_type)
p_donut <- ggplot(pie_df, aes(x = 2, y = n, fill = cell_type)) +
  geom_bar(stat = "identity", width = 1, color = "white", linewidth = 0.5) +
  coord_polar(theta = "y", start = 0) +
  scale_fill_manual(values = cluster_cols, name = "Cell type") +
  geom_text(aes(label = paste0(round(pct, 1), "%")),
            position = position_stack(vjust = 0.5),
            size = 3.5, color = "black", fontface = "bold") +
  xlim(0.5, 2.5) +
  labs(title = paste0("Cell type composition (n = ", sum(pie_df$n), ")")) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )


pdf(file.path(processed_path, "celltype_piechart.pdf"),
    width = 8, height = 6)
print(p_donut)
dev.off()

image

Marker Gene Expression

Visualize expression of known breast cancer cell type markers in spatial context.

Gene Cell Type Marker For
AGR3 Luminal epithelial cells
VWF Vascular endothelial cells
TRAC T cells (canonical T-cell marker)
# AGR3 is a marker of luminal epithelial cells
# VWF labels vascular endothelial cells 
# TRAC is a canonical T-cell marker

pdf(file.path(processed_path, "11_spatial_feature.pdf"),
    width = 12, height = 10)
ImageFeaturePlot(xenium.region,
                 fov      = "fov",
                 features = c("AGR3", "VWF", "TRAC"), 
                 size     = 0.75)
dev.off()

image

# polygon_plot_with_transcripts 
# Draws cell-segmentation polygons colored by cluster_cols, with selected
# transcripts (AGR3, VWF, TRAC) overlaid only inside their target cell types.

source("Plot_function.R")

plot_xenium_polygons_with_transcripts(
  obj = xenium.region,
  color_fill = cluster_cols,
  idents_col = "cell_type",
  gene_plot = c("AGR3", "VWF", "TRAC"),
  gene_plot_col = c("AGR3" = "#0033FF", "VWF" = "#FF0000", "TRAC" = "#040404"),
  cell_style = "outline",
  pdf_prefix = "12_spatial_polygon_AGR3_VWF_TRAC",
  output_dir = processed_path
)

image

Ligand–Receptor Axis in Angiogenesis

Receptor Main Ligands
FLT1 (VEGFR1) VEGFA, VEGFB, PlGF
KDR (VEGFR2) VEGFA (primary), VEGFC (mature/cleaved), VEGFD (mature/cleaved)
NRP1 VEGFA (especially VEGFA165 isoform)
  • VEGFA is the dominant ligand for all three (FLT1, KDR, NRP1).
  • FLT1 (VEGFR1) binds VEGFA with very high affinity but often acts as a decoy receptor, limiting signaling.
  • KDR (VEGFR2) is the main signaling receptor responsible for angiogenesis.
  • NRP1 is a co-receptor, not a signaling receptor by itself; it enhances VEGFA signaling through KDR.

pdf(file.path(processed_path, "VEGFA_receptors_VlnPlot_by_cell_type.pdf"),
    width = 15, height = 6)

p <- VlnPlot(
  xenium.region,
  features = c("VEGFA", "FLT1", "KDR", "NRP1"),
  group.by = "cell_type",
  cols = cluster_cols,
  pt.size = 0,
  ncol = length(genes_use)
) &
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    legend.position = "none"
  )
print(p)

dev.off()

image


Spatial proximity of luminal subtypes to the TME

04272026

Session Info

sessionInfo()

Clone this wiki locally