MPO + Unswapping on a laptop CPU for peaked_circuit_P9_Hqap_56x1917

### Name

mpo_unswapping_cpu_laptop_56_1917

### Circuit

peaked_circuit_P9_Hqap_56x1917

### Value

100

### Method

MPO + Unswapping (CPU)

### Method proof

### **Peaked MPO Solver: CPU-Oriented Implementation**
* **Code:** [alexgalda-m/peaked-mpo-solver](https://github.com/alexgalda-m/peaked-mpo-solver) *(Apache-2.0)*
* **Reference:** Kremer & Dupuis 2026 (`arXiv:2604.21908`)

#### **Overview & Enhancements**
This is a CPU-oriented implementation of the **midpoint MPO + greedy-unswapping** strategy from Kremer & Dupuis (as applied in IBM's GPU submission #106). While the high-level approach is the same, this repository extends it with several optional features optimized for the CPU regime:

* **Symmetric (Both-Sided) Unswapping:** Evaluates swap candidates from the left, the right, and from both sides simultaneously, keeping the variant that maximally reduces the bond dimension.
* **Additional Optimizations:**
  * Cached SWAP-layer MPO construction
  * Full parity-swap probe reuse
  * Route-aware unswap candidate selection
  * Faster Sabre-based local rerouting
  * Fail-fast stall guard
* **Production Configuration:** The verified production path keeps the unswap-select-mode at `"bond"` with the pass order set to `"both, left, right"`.

#### **Repository Artifacts**
See `BENCHMARKS.md` in the repository for the full per-machine data table. The run folders under `runs/` contain the following artifacts for each measurement:
* `summary.json`
* `stats.csv`
* `samples.tsv`
* `plot.png`
* `samples.png`

---

#### **Benchmarks**
*Verified-clean runs executed at `--cutoff 0.0006`.*

| Apple Silicon SKU | Model Number | Execution Time | Peak counts | `#2` bitstring counts|
| :--- | :--- | :--- | :--- | :--- |
| **M5 Pro** | T6050 | 734.2 s | 92/1000 | 10/1000 |
| **M4** | T8132 | 747.2 s | 96/1000 | 9/1000 |
| **M2 Pro** | T6020 | 973.0 s | 46/1000 | 28/1000 |
| **M1 Max** | T6000 | 1339.5 s | 36/1000 | 6/1000 |

**Run Validation:**
All four benchmark runs successfully hit the following target states:
* `last_work_consumed` = 1885
* `termination_reason` = completed
* `matches_expected_bitstring` = true
---
  
  #### **Comparison to the Original GPU Baseline (#106)**
The original Kremer & Dupuis result on this exact circuit ([submission  #106](https://github.com/quantum-advantage-tracker/quantum-advantage-tracker.github.io/issues/106), verified) used the same MPO + unswapping method on a single datacenter GPU. The headline improvement is the **compute class** — the same simulation runs here on a consumer laptop CPU with no GPU:

  | Implementation | Hardware | Runtime | Peak counts |
  | :--- | :--- | :--- | :--- |
  | **Kremer & Dupuis (#106)** | 1× Nvidia A100 80 GB GPU | 4059 s | ~100/1000 |
  | **This work (M5 Pro)** | Apple M5 Pro laptop CPU | 734.2 s | 92/1000 |

Wall-clock is also ~5.5× faster.

---

#### 📝 **Note on Gate Counts**
The initial QASM circuit consists of 1,917 `rzz` and 3,890 `u` gates. During preprocessing, Qiskit's `Collect2qBlocks` and `ConsolidateBlocks` passes fuse these losslessly into **1,885 generic 2q-unitary blocks** before MPO compression begins. 

Progress is reported in these consolidated-block units (`0..1885`), directly matching the *"Total 2q Unitaries Consumed"* convention established in the Kremer & Dupuis reference notebook.

### Authors

Alexey Galda

### Institutions

Moderna

### Quantum runtime (seconds)

_No response_

### Classical runtime (seconds)

734

### Compute resources (quantum)

_No response_

### Compute resources (classical)

Apple M5 Pro (T6050) laptop, single Python process, Apple Accelerate BLAS

### Notes

12 min on a consumer Apple Silicon laptop CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPO + Unswapping on a laptop CPU for peaked_circuit_P9_Hqap_56x1917 #153

Name

Circuit

Value

Method

Method proof

Peaked MPO Solver: CPU-Oriented Implementation

Overview & Enhancements

Repository Artifacts

Benchmarks

Comparison to the Original GPU Baseline (#106)

📝 Note on Gate Counts

Authors

Institutions

Quantum runtime (seconds)

Classical runtime (seconds)

Compute resources (quantum)

Compute resources (classical)

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Apple Silicon SKU	Model Number	Execution Time	Peak counts	`#2` bitstring counts
M5 Pro	T6050	734.2 s	92/1000	10/1000
M4	T8132	747.2 s	96/1000	9/1000
M2 Pro	T6020	973.0 s	46/1000	28/1000
M1 Max	T6000	1339.5 s	36/1000	6/1000

Implementation	Hardware	Runtime	Peak counts
Kremer & Dupuis (#106)	1× Nvidia A100 80 GB GPU	4059 s	~100/1000
This work (M5 Pro)	Apple M5 Pro laptop CPU	734.2 s	92/1000

MPO + Unswapping on a laptop CPU for peaked_circuit_P9_Hqap_56x1917 #153

Description

Name

Circuit

Value

Method

Method proof

Peaked MPO Solver: CPU-Oriented Implementation

Overview & Enhancements

Repository Artifacts

Benchmarks

Comparison to the Original GPU Baseline (#106)

📝 Note on Gate Counts

Authors

Institutions

Quantum runtime (seconds)

Classical runtime (seconds)

Compute resources (quantum)

Compute resources (classical)

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions