|
| 1 | +#set page( |
| 2 | + paper: "a4", |
| 3 | + margin: (x: 2cm, y: 2cm), |
| 4 | + numbering: "I", |
| 5 | +) |
| 6 | +#set text( |
| 7 | + font: ("Linux Libertine", "Source Han Serif SC"), |
| 8 | + lang: "en", |
| 9 | + size: 11pt |
| 10 | +) |
| 11 | +#show math.equation: set text(font: "Latin Modern Math") |
| 12 | + |
| 13 | +// Nice explanation box style |
| 14 | +#let note-box(title, color, body) = block( |
| 15 | + fill: color.lighten(90%), |
| 16 | + stroke: (left: 4pt + color), |
| 17 | + inset: 12pt, |
| 18 | + radius: 4pt, |
| 19 | + width: 100%, |
| 20 | + [ |
| 21 | + #text(fill: color, weight: "bold", size: 12pt)[#title] |
| 22 | + #v(0.5em) |
| 23 | + #body |
| 24 | + ] |
| 25 | +) |
| 26 | + |
| 27 | +#set heading(numbering: none) |
| 28 | + |
| 29 | += Note |
| 30 | + |
| 31 | +== 1. #link(<NP_PP-complete_return>)[#text(fill:blue)[$("NP")^(PP)$-complete (complexity hierarchy)]] <NP_PP-complete> |
| 32 | + |
| 33 | +This class describes the difficulty of computational problems at a high level of the Polynomial Hierarchy, implying extreme computational hardness. |
| 34 | + |
| 35 | +#note-box("Intuition: nested decision and summation", blue)[ |
| 36 | + To understand this, break it into two layers: |
| 37 | + - **NP (Non-deterministic Polynomial)**: represents nondeterministic polynomial time, corresponding to the **MAX operation** in MMAP. This is like searching for an optimal solution in a huge space (e.g., TSP). |
| 38 | + - **PP (Probabilistic Polynomial)**: corresponds to the **SUM operation** in MMAP. It requires summing over all possibilities or marginal probabilities, often harder than just finding an optimum. |
| 39 | + |
| 40 | + **Meaning of $("NP")^(PP)$**: |
| 41 | + This is an "oracle" machine. It means we must solve an NP-hard optimization problem, but to verify each candidate, we must first solve a PP-hard summation problem. |
| 42 | + |
| 43 | + **Conclusion**: This is harder than NP-complete or PP-complete alone. Since exact solutions are infeasible in polynomial time, in QEC or large-scale probabilistic inference we **must** rely on approximate algorithms (e.g., variational inference or dual decomposition). |
| 44 | +] |
| 45 | + |
| 46 | +== 2. #link(<Lagrange_Multipliers_return>)[#text(fill:blue)[Lagrange Multipliers]] <Lagrange_Multipliers> |
| 47 | + |
| 48 | +In optimization theory, this is a core technique for constrained problems. In dual decomposition, it acts as a "coordination variable." |
| 49 | + |
| 50 | +#note-box("Mechanism: reach consensus via prices", orange)[ |
| 51 | + When we decompose a complex global problem into independent subproblems (e.g., subgraph A and B), these subproblems can disagree on shared variables. |
| 52 | + |
| 53 | + - **Hard constraint**: require $x_A = x_B$. Solving with hard constraints is difficult. |
| 54 | + - **Relaxation**: remove hard constraints and add Lagrange multiplier $delta$ as a penalty in the objective. |
| 55 | + |
| 56 | + **Role of $delta$**: |
| 57 | + Think of $delta$ as the **price of inconsistency**. |
| 58 | + - If subproblem A predicts a higher value than B, the algorithm adjusts $delta$ to "fine" A and "subsidize" B. |
| 59 | + - By iteratively updating $delta$ (usually via subgradient methods), we force each subproblem to approach global consistency while optimizing locally. |
| 60 | +] |
| 61 | + |
| 62 | +== 3. #link(<Variational_Upper_Bound_return>)[#text(fill:blue)[Variational Upper Bound]] <Variational_Upper_Bound> |
| 63 | + |
| 64 | +When the objective is intractable (e.g., partition function or marginal likelihood), we build a tractable function that always upper-bounds the true objective. |
| 65 | + |
| 66 | +#note-box("Geometric intuition: lowering the envelope", green)[ |
| 67 | + Suppose the true optimum is $Phi^*$ (the true MMAP log-probability). Computing it directly requires high-dimensional sums/integrals. |
| 68 | + |
| 69 | + **Variational strategy:** |
| 70 | + 1. **Construct the dual function $L(delta)$**: via dual decomposition. By weak duality, for any $delta$, $L(delta) >= Phi^*$. |
| 71 | + 2. **Minimize the upper bound**: since $L(delta)$ is always above $Phi^*$, we search for $delta$ that lowers it. |
| 72 | + 3. **Approximation**: as we lower this "ceiling," it approaches the true $Phi^*$. |
| 73 | + |
| 74 | + At convergence, the value may still be approximate, but the **upper bound** provides a theoretical guarantee on solution quality (the duality gap). |
| 75 | +] |
| 76 | + |
| 77 | +// (Sections 1-3 above were previously here; omitted to save space, continuing with section 4) |
| 78 | + |
| 79 | +== 4. #link(<Dual_Decomposition_return>)[#text(fill:blue)[Dual Decomposition]] <Dual_Decomposition> |
| 80 | + |
| 81 | +This is the core algorithmic framework for complex graphical model inference. Its philosophy is "divide and coordinate." |
| 82 | + |
| 83 | +#note-box("Core logic: split and negotiate", purple)[ |
| 84 | + For a complex global problem (e.g., MMAP on a 2D grid), direct solution is extremely hard due to tight coupling. Dual decomposition proceeds as follows: |
| 85 | + |
| 86 | + 1. **Decompose**: cut some variable interactions, split the big graph into disjoint, easy subgraphs (trees or chains). |
| 87 | + 2. **Solve locally**: each subgraph performs inference independently. Because the structure is simple, this is fast (polynomial time). |
| 88 | + 3. **Coordinate**: the split is artificial, so subgraphs may disagree on shared variables. We introduce **dual variables** (see section 2) to penalize disagreement and drive consensus. |
| 89 | + |
| 90 | + This is like distributing a big project to multiple teams; the manager (master algorithm) adjusts incentives so outputs align. |
| 91 | +] |
| 92 | + |
| 93 | +== 5. #link(<Grid_Decomposition_return>)[#text(fill:blue)[Grid decomposition details (Row/Col Decomposition)]] <Grid_Decomposition> |
| 94 | + |
| 95 | +For surface codes or 2D Ising models, how do we decompose an $N times N$ grid into easy structures? |
| 96 | + |
| 97 | +#note-box("Operational details: edge-based split", red)[ |
| 98 | + A 2D grid has **nodes** and **edges**. The difficulty comes from **loops**. Our goal is to remove loops while keeping tree structures. |
| 99 | + |
| 100 | + **Steps:** |
| 101 | + 1. **Duplicate nodes**: for each node $x_(i,j)$, create a copy $x_(i,j)^("row")$ in the horizontal set and $x_(i,j)^("col")$ in the vertical set. |
| 102 | + 2. **Assign edges (key step)**: |
| 103 | + - **Row strips**: keep only **horizontal edges**. The grid becomes $N$ independent horizontal chains (1D, no loops). |
| 104 | + - **Col strips**: keep only **vertical edges**. The grid becomes $N$ independent vertical chains. |
| 105 | + 3. **Result**: instead of one complex 2D grid, we now have $2N$ simple 1D chains. |
| 106 | + |
| 107 | + This decomposition is ideal for parallel computation because each chain's inference (transfer matrix or forward-backward) is independent. |
| 108 | +] |
| 109 | + |
| 110 | +== 6. #link(<Consensus_Constraint_return>)[#text(fill:blue)[Consistency constraint (Why Consistency?)]] <Consensus_Constraint> |
| 111 | + |
| 112 | +In step 5, we created "row copies" and "col copies." Why must we enforce their agreement? |
| 113 | + |
| 114 | +#note-box("Logic: returning to physical reality", teal)[ |
| 115 | + **Why consistency?** |
| 116 | + In the real physical system (original problem), qubit $(i,j)$ is unique. |
| 117 | + - It cannot be "error" ($x=1$) from the row view while "no error" ($x=0$) from the column view. |
| 118 | + - If copies disagree, the solution violates physical reality and is invalid. |
| 119 | + |
| 120 | + **Role of Lagrangian relaxation:** |
| 121 | + - Ideally, we want a hard constraint: $x^("row") = x^("col")$. This is hard to enforce directly. |
| 122 | + - **Relaxation**: allow temporary disagreement, but penalize it with Lagrange multipliers $delta$. |
| 123 | + - At convergence, if penalties work, the copies converge to the same value, and $L(delta)$ equals (or closely approximates) the original optimum. |
| 124 | +] |
0 commit comments