seqeralabs · justinegeffen · Jan 28, 2026 · Jan 25, 2026 · Jan 28, 2026 · Jan 28, 2026
diff --git a/fusion_docs/troubleshooting/fusion-snapshots.md b/fusion_docs/troubleshooting/fusion-snapshots.md
@@ -28,13 +28,14 @@ To resolve this issue:
    - Lower memory requested by tasks
    - Process smaller data chunks
    - Set `process.resourceLimits` to enforce limits:
-```groovy
+
+     ```groovy
      // AWS Batch example
      process.resourceLimits = [cpus: 32, memory: '60.GB']
 
      // Google Batch example (more conservative for 30s window)
      process.resourceLimits = [cpus: 16, memory: '20.GB']
-```
+     ```
 
 1. Increase network bandwidth:
 
@@ -47,18 +48,19 @@ To resolve this issue:
    - Avoid ARM64 instances if checkpoints are failing.
 
 1. Configure retry strategy:
-```groovy
-   process {
-       maxRetries = 2
-       errorStrategy = {
-           if (task.exitStatus == 175) {
-               return 'retry'
-           } else {
-               return 'terminate'
-           }
-       }
-   }
-```
+
+   ```groovy
+      process {
+         maxRetries = 2
+         errorStrategy = {
+            if (task.exitStatus == 175) {
+                  return 'retry'
+            } else {
+                  return 'terminate'
+            }
+         }
+      }
+   ```
 
 See [AWS Batch instance selection](../guide/snapshots/aws#selecting-an-ec2-instance) or [Google Batch best practices](../guide/snapshots/gcp) for recommended configurations.
 
@@ -80,10 +82,12 @@ This issue can occur due to:
 To resolve this issue:
 
 1. Check if previous checkpoint completed:
+
    - Review logs for "Dumping finished successfully".
    - If the "Dumping finished successfully" message is missing, it means the previous checkpoint timed out with a `175` exit error.
 
 1. Verify checkpoint data exists:
+
    - Check that the `.fusion/dump/` work directory contains checkpoint files.
    - Ensure that the S3/GCS bucket is accessible.
    - If the bucket is missing, open a support ticket. See [Getting help](#getting-help) for more information.
@@ -109,11 +113,13 @@ This issue can occur due to:
 To resolve this issue:
 
 1. For AWS Batch (120-second window):
+
    - Use instances with 5:1 or better memory:bandwidth ratio.
    - Use `x86_64` instances for incremental snapshot support (`c6id`, `m6id`, `r6id` families).
    - Check architecture: `uname -m`
 
 1. For Google Batch (30-second window):
+
    - Use `x86_64` instances (mandatory for larger workloads).
    - Use more conservative memory limits.
    - Consider smaller instance types with better ratios.
@@ -137,21 +143,24 @@ This issue can occur due to:
 To resolve this issue:
 
 1. Split large tasks:
+
    - Break into smaller, checkpointable units.
    - Process data in chunks.
 
 1. Switch to `x86_64` instances:
+
    - Essential for Google Batch.
    - Recommended for AWS Batch tasks > 40 GiB.
 
 1. Adjust memory limits:
-```groovy
-   // For AWS Batch
-   process.resourceLimits = [cpus: 32, memory: '60.GB']
 
-   // For Google Batch (more conservative)
-   process.resourceLimits = [cpus: 16, memory: '20.GB']
-```
+   ```groovy
+      // For AWS Batch
+      process.resourceLimits = [cpus: 32, memory: '60.GB']
+
+      // For Google Batch (more conservative)
+      process.resourceLimits = [cpus: 16, memory: '20.GB']
+   ```
 
 ## SSL/TLS connection errors after restore
 
@@ -160,6 +169,7 @@ Applications fail after restore with connection errors, especially HTTPS connect
 This issue occurs when applications use HTTPS connections, as CRIU cannot preserve encrypted TCP connections (SSL/TLS).
 
 To resolve this issue, configure TCP close mode to drop connections during checkpoint:
+
 ```groovy
 process.containerOptions = '-e FUSION_SNAPSHOTS_TCP_MODE=close'
 ```
@@ -180,64 +190,71 @@ To diagnose checkpoint problems:
 
    - Check `.command.log` in the task work directory for Fusion Snapshots messages (prefixed with timestamps).
 
-        :::tip
-        Enable `debug` logging for more details.
-```groovy
-        process.containerOptions = '-e FUSION_SNAPSHOT_LOG_LEVEL=debug'
-```
-        :::
-
-1. Inspect your checkpoint data:
-
-    1. Open the `.fusion/dump/` folder:
-```console
-        .fusion/dump/
-        ├── 1/                   # First dump
-        │   ├── pre_*.log        # Pre-dump log (if incremental)
-        │   └── <CRIU files>
-        ├── 2/                   # Second dump
-        │   ├── pre_*.log
-        │   └── <CRIU files>
-        ├── 3/                   # Third dump (full)
-        │   ├── dump_*.log       # Full dump log
-        │   ├── restore_*.log    # Restore log (if restored)
-        │   └── <CRIU files>
-        └── dump_metadata        # Metadata tracking all dumps
-```
-
-    1. For incremental dumps (PRE type), check for success markers at the end of the `pre_*.log` file:
-```console
-        (66.525687) page-pipe: Killing page pipe
-        (66.563939) irmap: Running irmap pre-dump
-        (66.610871) Writing stats
-        (66.658902) Pre-dumping finished successfully
-```
+     :::tip
+     Enable `debug` logging for more details.
 
-    1. For full dumps (FULL type), check for success markers at the end of the `dump_*.log` file:
-```console
-        (25.867099) Unseizing 90 into 2
-        (27.160829) Writing stats
-        (27.197458) Dumping finished successfully
-```
+     ```groovy
+     process.containerOptions = '-e FUSION_SNAPSHOT_LOG_LEVEL=debug'
+     ```
 
-    1. If the log ends abruptly without success message, check the last timestamp:
-```console
-        (121.37535) Dumping path for 329 fd via self 353 [/path/to/file.tmp]
-        (121.65146) 90 fdinfo 330: pos: 0x4380000 flags: 100000/0
-        # Log truncated - instance was reclaimed before dump completed
-```
+     :::
 
-        - AWS Batch: Timestamps near 120 seconds indicate instance terminated during dump.
-        - Google Batch: Timestamps near 30 seconds indicate instance terminated during dump.
-
-        Cause: Task memory too large or bandwidth too low for reclamation window.
+1. Inspect your checkpoint data:
 
-    1. For restore operations, check for a success marker at the end of the `restore_*.log` file:
-```console
-        (145.81974) Running pre-resume scripts
-        (145.81994) Restore finished successfully. Tasks resumed.
-        (145.82001) Writing stats
-```
+   1. Open the `.fusion/dump/` folder:
+
+      ```console
+      .fusion/dump/
+      ├── 1/                   # First dump
+      │   ├── pre_*.log        # Pre-dump log (if incremental)
+      │   └── <CRIU files>
+      ├── 2/                   # Second dump
+      │   ├── pre_*.log
+      │   └── <CRIU files>
+      ├── 3/                   # Third dump (full)
+      │   ├── dump_*.log       # Full dump log
+      │   ├── restore_*.log    # Restore log (if restored)
+      │   └── <CRIU files>
+      └── dump_metadata        # Metadata tracking all dumps
+      ```
+
+   1. For incremental dumps (PRE type), check for success markers at the end of the `pre_*.log` file:
+
+      ```console
+      (66.525687) page-pipe: Killing page pipe
+      (66.563939) irmap: Running irmap pre-dump
+      (66.610871) Writing stats
+      (66.658902) Pre-dumping finished successfully
+      ```
+
+   1. For full dumps (FULL type), check for success markers at the end of the `dump_*.log` file:
+
+      ```console
+      (25.867099) Unseizing 90 into 2
+      (27.160829) Writing stats
+      (27.197458) Dumping finished successfully
+      ```
+
+   1. If the log ends abruptly without success message, check the last timestamp:
+
+      ```console
+      (121.37535) Dumping path for 329 fd via self 353 [/path/to/file.tmp]
+      (121.65146) 90 fdinfo 330: pos: 0x4380000 flags: 100000/0
+      # Log truncated - instance was reclaimed before dump completed
+      ```
+
+      - AWS Batch: Timestamps near 120 seconds indicate instance terminated during dump.
+      - Google Batch: Timestamps near 30 seconds indicate instance terminated during dump.
+
+      Cause: Task memory too large or bandwidth too low for reclamation window.
+
+   1. For restore operations, check for a success marker at the end of the `restore_*.log` file:
+
+      ```console
+      (145.81974) Running pre-resume scripts
+      (145.81994) Restore finished successfully. Tasks resumed.
+      (145.82001) Writing stats
+      ```
 
 1. Verify your configuration:
 
@@ -250,8 +267,8 @@ To diagnose checkpoint problems:
 
 1. Test with different instance types. If uncertain:
 
-    - Run the same task with different instance types that have better disk iops and bandwidth guarantees and verify if Fusions Snapshots work there.
-    - Decrease memory usage to a manageable amount.
+   - Run the same task with different instance types that have better disk iops and bandwidth guarantees and verify if Fusions Snapshots work there.
+   - Decrease memory usage to a manageable amount.
 
 :::tip
 For detailed information about error codes and logging, see [Error reference](./error-codes-exit-messages).