Skip to content

Commit 216413e

Browse files
committed
update
1 parent c5959f1 commit 216413e

File tree

8 files changed

+517
-447
lines changed

8 files changed

+517
-447
lines changed

doc/pub/week15/html/week15-bs.html

Lines changed: 59 additions & 45 deletions
Large diffs are not rendered by default.

doc/pub/week15/html/week15-reveal.html

Lines changed: 55 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -382,28 +382,38 @@ <h2 id="what-will-we-need-in-the-case-of-a-quantum-computer">What will we need i
382382
<div class="alert alert-block alert-block alert-text-normal">
383383
<b></b>
384384
<p>
385-
<p>We will have to translate the classical data point \( \vec{x} \)
386-
into a quantum datapoint \( \vert \Phi{(\vec{x})} \rangle \). This can
387-
be achieved by a circuit \( \mathcal{U}_{\Phi(\vec{x})} \vert 0\rangle \).
385+
<p>We will have to translate the classical data point \( \boldsymbol{x} \)
386+
into a quantum datapoint \( \vert \Phi{(\boldsymbol{x})} \rangle \). This can
387+
be achieved by a circuit \( \mathcal{U}_{\Phi(\boldsymbol{x})} \vert 0\rangle \).
388388
</p>
389389

390390
<p>Here \( \Phi() \) could be any classical function applied
391-
on the classical data \( \vec{x} \).
391+
on the classical data \( \boldsymbol{x} \).
392392
</p>
393393
</div>
394394

395395
<div class="alert alert-block alert-block alert-text-normal">
396396
<b></b>
397397
<p>
398-
<p>We need a parameterized quantum circuit \( W(\theta) \) that
398+
<p>We need a parameterized quantum circuit \( W(\Theta) \) that
399399
processes the data in a way that in the end we
400400
can apply a measurement that returns a classical value \( -1 \) or
401-
\( 1 \) for each classical input \( \vec{x} \) that indentifies the label
401+
\( 1 \) for each classical input \( \boldsymbol{x} \) that indentifies the label
402402
of the classical data.
403403
</p>
404404
</div>
405405
</section>
406406

407+
<section>
408+
<h2 id="parameterized-quantum-circuits">Parameterized quantum circuits </h2>
409+
410+
<br/><br/>
411+
<center>
412+
<p><img src="figures/pqc.png" width="900" align="bottom"></p>
413+
</center>
414+
<br/><br/>
415+
</section>
416+
407417
<section>
408418
<h2 id="the-most-general-ansatz">The most general ansatz </h2>
409419

@@ -412,7 +422,7 @@ <h2 id="the-most-general-ansatz">The most general ansatz </h2>
412422
</p>
413423
<p>&nbsp;<br>
414424
$$
415-
W(\theta) \mathcal{U}_{\Phi}(\vec{x}) \vert 0 \rangle.
425+
W(\Theta) \mathcal{U}_{\Phi}(\boldsymbol{x}) \vert 0 \rangle.
416426
$$
417427
<p>&nbsp;<br>
418428

@@ -425,7 +435,7 @@ <h2 id="quantum-svm">Quantum SVM </h2>
425435
<p>In the case of a quantum SVM we will only use the quantum feature maps</p>
426436
<p>&nbsp;<br>
427437
$$
428-
\mathcal{U}_{\Phi(\vec{x})},
438+
\mathcal{U}_{\Phi(\boldsymbol{x})},
429439
$$
430440
<p>&nbsp;<br>
431441

@@ -444,14 +454,14 @@ <h2 id="defining-the-quantum-kernel">Defining the Quantum Kernel </h2>
444454
</p>
445455
<p>&nbsp;<br>
446456
$$
447-
K(\vec{x}, \vec{z}) = \vert \langle \Phi (\vec{x}) \vert \Phi(\vec{z}) \rangle \vert^2 = \langle 0^n \vert \mathcal{U}_{\Phi(\vec{x})}^{t} \mathcal{U}_{\Phi(\vec{z})} \vert 0^n \rangle,
457+
K(\boldsymbol{x}, \boldsymbol{z}) = \vert \langle \Phi (\boldsymbol{x}) \vert \Phi(\boldsymbol{z}) \rangle \vert^2 = \langle 0^n \vert \mathcal{U}_{\Phi(\boldsymbol{x})}^{t} \mathcal{U}_{\Phi(\boldsymbol{z})} \vert 0^n \rangle,
448458
$$
449459
<p>&nbsp;<br>
450460

451461
<p>but now with the quantum feature maps</p>
452462
<p>&nbsp;<br>
453463
$$
454-
\mathcal{U}_{\Phi(\vec{x})}.
464+
\mathcal{U}_{\Phi(\boldsymbol{x})}.
455465
$$
456466
<p>&nbsp;<br>
457467

@@ -839,8 +849,8 @@ <h2 id="quantum-neural-network">Quantum neural network </h2>
839849
kernel machines with a particular kernel determined by the circuit .
840850
In fact, one can often find a kernel SVM that matches or outperforms
841851
the variational model. In practice, one can combine these: use a
842-
trainable quantum embedding \( U(\boldsymbol{x};\theta) \) with tunable
843-
parameters \( \theta \), and optimize \( \theta \) to maximize the SVM
852+
trainable quantum embedding \( U(\boldsymbol{x};\Theta) \) with tunable
853+
parameters \( \Theta \), and optimize \( \Theta \) to maximize the SVM
844854
classification accuracy. This is called a quantum kernel learning
845855
approach.
846856
</p>
@@ -1848,8 +1858,8 @@ <h2 id="variational-quantum-circuits">Variational Quantum Circuits </h2>
18481858
optimizer . In this framework, a Variational Quantum Circuit (VQC)
18491859
typically has three parts : (i) a state preparation or feature map
18501860
that encodes classical input \( \mathbf{x} \) into a quantum state; (ii) a
1851-
parameterized circuit \( W(\boldsymbol\theta) \) (often called the ansatz)
1852-
that depends on trainable parameters \( \boldsymbol\theta \); and (iii) a
1861+
parameterized circuit \( W(\boldsymbol\Theta) \) (often called the ansatz)
1862+
that depends on trainable parameters \( \boldsymbol\Theta \); and (iii) a
18531863
measurement that extracts a classical output from the final quantum
18541864
state.
18551865
</p>
@@ -1868,20 +1878,20 @@ <h2 id="setting-up-a-vqc">Setting up a VQC </h2>
18681878

18691879
<p>where \( U(\mathbf{x}) \) is a unitary (possibly composed of rotations)
18701880
that depends on the data. We then apply the variational circuit
1871-
\( W(\boldsymbol\theta) \), often built as a product of layers
1872-
\( V_j(\theta_j) \), so that the final state is
1881+
\( W(\boldsymbol\Theta) \), often built as a product of layers
1882+
\( V_j(\Theta_j) \), so that the final state is
18731883
</p>
18741884

18751885
<p>&nbsp;<br>
18761886
$$
1877-
\vert \Psi(\mathbf{x};\boldsymbol\theta)\rangle = W(\boldsymbol\theta),U(\mathbf{x}),|0\rangle^{\otimes n}.
1887+
\vert \Psi(\mathbf{x};\boldsymbol\Theta)\rangle = W(\boldsymbol\Theta),U(\mathbf{x}),|0\rangle^{\otimes n}.
18781888
$$
18791889
<p>&nbsp;<br>
18801890

18811891
<p>For instance, one common ansatz is the hardware-efficient circuit:
18821892
layers of parameterized single-qubit rotations and entangling gates
18831893
(like CNOTs) repeated several times. The structure of
1884-
\( W(\boldsymbol\theta) \) can dramatically affect the circuit&#8217;s
1894+
\( W(\boldsymbol\Theta) \) can dramatically affect the circuit&#8217;s
18851895
expressivity and trainability.
18861896
</p>
18871897
</section>
@@ -1895,21 +1905,21 @@ <h2 id="outputs">Outputs </h2>
18951905
</p>
18961906
<p>&nbsp;<br>
18971907
$$
1898-
f_k(\mathbf{x};\boldsymbol\theta) ;=; \langle \Psi(\mathbf{x};\boldsymbol\theta) | \hat B_k | \Psi(\mathbf{x};\boldsymbol\theta)\rangle.
1908+
f_k(\mathbf{x};\boldsymbol\Theta) ;=; \langle \Psi(\mathbf{x};\boldsymbol\Theta) | \hat B_k | \Psi(\mathbf{x};\boldsymbol\Theta)\rangle.
18991909
$$
19001910
<p>&nbsp;<br>
19011911

19021912
<p>Equivalently, with</p>
19031913
<p>&nbsp;<br>
19041914
$$
1905-
\vert \Psi(\mathbf{x};\boldsymbol\theta)\rangle = W(\boldsymbol\theta)U(\mathbf{x})|0\rangle,
1915+
\vert \Psi(\mathbf{x};\boldsymbol\Theta)\rangle = W(\boldsymbol\Theta)U(\mathbf{x})|0\rangle,
19061916
$$
19071917
<p>&nbsp;<br>
19081918

19091919
<p>one has</p>
19101920
<p>&nbsp;<br>
19111921
$$
1912-
f_k(\mathbf{x};\boldsymbol\theta) = \langle 0|U(\mathbf{x})^\dagger W(\boldsymbol\theta)^\dagger ,\hat B_k, W(\boldsymbol\theta) U(\mathbf{x}),|0\rangle.
1922+
f_k(\mathbf{x};\boldsymbol\Theta) = \langle 0|U(\mathbf{x})^\dagger W(\boldsymbol\Theta)^\dagger ,\hat B_k, W(\boldsymbol\Theta) U(\mathbf{x}),|0\rangle.
19131923
$$
19141924
<p>&nbsp;<br>
19151925

@@ -1920,9 +1930,9 @@ <h2 id="outputs">Outputs </h2>
19201930
<h2 id="short-summary">Short summary </h2>
19211931

19221932
<p>In summary, a variational quantum model
1923-
\( f(\mathbf{x};\boldsymbol\theta) \) maps inputs to outputs via the
1933+
\( f(\mathbf{x};\boldsymbol\Theta) \) maps inputs to outputs via the
19241934
hybrid quantum-classical procedure. During training, the classical
1925-
optimizer adjusts \( \boldsymbol\theta \) (e.g. by gradient descent) to
1935+
optimizer adjusts \( \boldsymbol\Theta \) (e.g. by gradient descent) to
19261936
minimize a cost function (like mean-squared error) defined on a
19271937
dataset. Because the mapping is inherently quantum, these models can,
19281938
in principle, harness the high-dimensional Hilbert space for richer
@@ -1944,21 +1954,21 @@ <h2 id="mathematical-example">Mathematical example </h2>
19441954
<p>and a variational layer is</p>
19451955
<p>&nbsp;<br>
19461956
$$
1947-
V(\boldsymbol\theta)=R_y(\theta_1)\otimes R_y(\theta_2),\mathrm{CNOT}(0,1),
1957+
V(\boldsymbol\Theta)=R_y(\Theta_1)\otimes R_y(\Theta_2),\mathrm{CNOT}(0,1),
19481958
$$
19491959
<p>&nbsp;<br>
19501960

19511961
<p>(apply \( R_y \) on each qubit then entangle). After
1952-
applying \( W(\boldsymbol\theta)=V(\boldsymbol\theta) \) to \( |0,0\rangle \),
1962+
applying \( W(\boldsymbol\Theta)=V(\boldsymbol\Theta) \) to \( |0,0\rangle \),
19531963
we measure \( \hat B=Z\otimes I \) on qubit 0. The output is
19541964
</p>
19551965
<p>&nbsp;<br>
19561966
$$
1957-
f(\mathbf{x};\boldsymbol\theta) = \langle 0,0|,U(\mathbf{x})^\dagger,V(\boldsymbol\theta)^\dagger, (Z\otimes I), V(\boldsymbol\theta),U(\mathbf{x}),|0,0\rangle.
1967+
f(\mathbf{x};\boldsymbol\Theta) = \langle 0,0|,U(\mathbf{x})^\dagger,V(\boldsymbol\Theta)^\dagger, (Z\otimes I), V(\boldsymbol\Theta),U(\mathbf{x}),|0,0\rangle.
19581968
$$
19591969
<p>&nbsp;<br>
19601970

1961-
<p>This \( f(x;\theta) \) is then compared to the target in a cost function for optimization.</p>
1971+
<p>This \( f(x;\Theta) \) is then compared to the target in a cost function for optimization.</p>
19621972
</section>
19631973

19641974
<section>
@@ -1971,15 +1981,15 @@ <h2 id="key-elements">Key elements </h2>
19711981
on each qubit , while more complex feature maps may exploit
19721982
entanglement. The circuit output is obtained via expectation values
19731983
of observables (e.g. Pauli-Z), yielding a differentiable function
1974-
\( f(\mathbf{x};\boldsymbol\theta) \) .
1984+
\( f(\mathbf{x};\boldsymbol\Theta) \) .
19751985
</p>
19761986
</section>
19771987

19781988
<section>
19791989
<h2 id="test-yourself-exercises">Test yourself exercises </h2>
19801990

19811991
<ol>
1982-
<p><li> Compute the state \( |\Psi(\mathbf{x};\boldsymbol\theta)\rangle \) explicitly for a 1-qubit VQC with \( U(x)=R_x(x) \) and \( W(\theta)=R_y(\theta) \). What is \( \langle Z\rangle \) as a function of \( x,\theta \)?</li>
1992+
<p><li> Compute the state \( |\Psi(\mathbf{x};\boldsymbol\Theta)\rangle \) explicitly for a 1-qubit VQC with \( U(x)=R_x(x) \) and \( W(\Theta)=R_y(\Theta) \). What is \( \langle Z\rangle \) as a function of \( x,\Theta \)?</li>
19831993
<p><li> Draw (or describe) a hardware-efficient ansatz for 3 qubits with 2 layers of rotations and CNOTs. How many parameters does it have?</li>
19841994
</ol>
19851995
<p>
@@ -2029,7 +2039,7 @@ <h2 id="input-encoding">Input Encoding </h2>
20292039
<h2 id="qnn-architecture-and-models">QNN Architecture and Models </h2>
20302040

20312041
<p>A general QNN can be viewed as a parameterized unitary
2032-
\( U(\mathbf{x},\boldsymbol\theta) \) acting on \( n \) qubits, followed by
2042+
\( U(\mathbf{x},\boldsymbol\Theta) \) acting on \( n \) qubits, followed by
20332043
measurements. Fig. 2 (placeholder) might depict a generic QNN with
20342044
several layers of trainable gates. Each layer can entangle qubits,
20352045
building up complexity. The output is then a (classical) vector of
@@ -2045,8 +2055,8 @@ <h2 id="a-simple-feedforward-qnn-structure">A simple feedforward QNN structure <
20452055
<p>
20462056
<ol>
20472057
<p><li> Embedding Layer: Convert \( \mathbf{x} \) to \( |0\rangle^{\otimes n} \) via \( U(\mathbf{x}) \).</li>
2048-
<p><li> Variational Layers: Repeat \( L \) blocks of parameterized gates \( W(\boldsymbol\theta^{(l)}) \) (each block may act on all or subsets of qubits).</li>
2049-
<p><li> Measurement: Measure selected qubits or observables to obtain the output predictions \( f(\mathbf{x};\boldsymbol\theta) \).</li>
2058+
<p><li> Variational Layers: Repeat \( L \) blocks of parameterized gates \( W(\boldsymbol\Theta^{(l)}) \) (each block may act on all or subsets of qubits).</li>
2059+
<p><li> Measurement: Measure selected qubits or observables to obtain the output predictions \( f(\mathbf{x};\boldsymbol\Theta) \).</li>
20502060
</ol>
20512061
</div>
20522062
</section>
@@ -2055,8 +2065,8 @@ <h2 id="a-simple-feedforward-qnn-structure">A simple feedforward QNN structure <
20552065
<h2 id="example">Example </h2>
20562066

20572067
<p>For example, a 2-layer QNN on 2 qubits might apply encoding
2058-
\( R_x(x_1)\otimes R_x(x_2) \), then apply \( W(\theta^{(1)}) \), then again
2059-
encoding (or not), then \( W(\theta^{(2)}) \), and finally measure. In
2068+
\( R_x(x_1)\otimes R_x(x_2) \), then apply \( W(\Theta^{(1)}) \), then again
2069+
encoding (or not), then \( W(\Theta^{(2)}) \), and finally measure. In
20602070
classification tasks, one typically assigns a label based on the sign
20612071
of \( \langle Z\rangle \) or uses multiple measurements for multi-class
20622072
outputs.
@@ -2074,19 +2084,19 @@ <h2 id="example">Example </h2>
20742084
<section>
20752085
<h2 id="training-output-and-cost-loss-function">Training output and Cost/Loss-function </h2>
20762086

2077-
<p>Given a QNN with output \( f(\mathbf{x};\boldsymbol\theta) \) (a real
2087+
<p>Given a QNN with output \( f(\mathbf{x};\boldsymbol\Theta) \) (a real
20782088
number or vector of real values), one must define a loss function to
20792089
train on data. Common choices are the mean squared error (MSE) for
20802090
regression or cross-entropy for classification. For a training set
20812091
\( {\mathbf{x}i,y_i} \), the MSE cost/loss-function is
20822092
</p>
20832093
<p>&nbsp;<br>
20842094
$$
2085-
C(\boldsymbol\theta) = \frac{1}{N} \sum_{i=1}^N \bigl(f(\mathbf{x}i;\boldsymbol\theta) - y_i\bigr)^2.
2095+
C(\boldsymbol\Theta) = \frac{1}{N} \sum_{i=1}^N \bigl(f(\mathbf{x}i;\boldsymbol\Theta) - y_i\bigr)^2.
20862096
$$
20872097
<p>&nbsp;<br>
20882098

2089-
<p>One then computes gradients \( \nabla{\boldsymbol\theta}C \) and updates
2099+
<p>One then computes gradients \( \nabla{\boldsymbol\Theta}C \) and updates
20902100
parameters via gradient descent or other optimizers.
20912101
</p>
20922102
</section>
@@ -2095,7 +2105,7 @@ <h2 id="training-output-and-cost-loss-function">Training output and Cost/Loss-fu
20952105
<h2 id="exampe-variational-classifier">Exampe: Variational Classifier </h2>
20962106

20972107
<p>A binary classifier can output
2098-
\( f(\mathbf{x};\boldsymbol\theta)=\langle Z_0\rangle \) on qubit 0, and
2108+
\( f(\mathbf{x};\boldsymbol\Theta)=\langle Z_0\rangle \) on qubit 0, and
20992109
predict label \( +1 \) if \( f\ge0 \), else \( -1 \).
21002110
</p>
21012111
</section>
@@ -2136,15 +2146,15 @@ <h2 id="training-qnns-and-loss-landscapes">Training QNNs and Loss Landscapes </h
21362146
<section>
21372147
<h2 id="gradient-computation">Gradient Computation </h2>
21382148

2139-
<p>Gradients \( \partial f/\partial\theta_j \) are obtained using the parameter-shift rule. For many gates \( e^{-i\theta P/2} \) (with \( P \) a Pauli), one can compute</p>
2149+
<p>Gradients \( \partial f/\partial\Theta_j \) are obtained using the parameter-shift rule. For many gates \( e^{-i\Theta P/2} \) (with \( P \) a Pauli), one can compute</p>
21402150
<p>&nbsp;<br>
21412151
$$
2142-
\frac{\partial}{\partial\theta}\langle B\rangle
2143-
= \frac{1}{2}\Bigl[\langle B\rangle_{\theta+\pi/2} - \langle B\rangle_{\theta-\pi/2}\Bigr],
2152+
\frac{\partial}{\partial\Theta}\langle B\rangle
2153+
= \frac{1}{2}\Bigl[\langle B\rangle_{\Theta+\pi/2} - \langle B\rangle_{\Theta-\pi/2}\Bigr],
21442154
$$
21452155
<p>&nbsp;<br>
21462156

2147-
<p>where \( \langle B\rangle_{\theta\pm\pi/2} \) are expectation values
2157+
<p>where \( \langle B\rangle_{\Theta\pm\pi/2} \) are expectation values
21482158
evaluated at shifted parameter values. This formula allows exact
21492159
gradients by two circuit evaluations per parameter (independent of
21502160
circuit size). PennyLane automatically applies parameter-shift rule
@@ -2183,7 +2193,7 @@ <h2 id="barren-plateaus">Barren Plateaus </h2>
21832193
<section>
21842194
<h2 id="cost-loss-landscape-visualization">Cost/Loss-landscape visualization </h2>
21852195

2186-
<p>One can imagine the cost/loss function \( C(\boldsymbol\theta) \) over the
2196+
<p>One can imagine the cost/loss function \( C(\boldsymbol\Theta) \) over the
21872197
parameter space. Unlike convex classical problems, this landscape may
21882198
have many local minima and saddle points. Barren plateaus correspond
21892199
to regions where \( \nabla C\approx 0 \) almost everywhere. Even if
@@ -2205,8 +2215,8 @@ <h2 id="cost-loss-landscape-visualization">Cost/Loss-landscape visualization </h
22052215
<h2 id="exercises">Exercises </h2>
22062216

22072217
<ol>
2208-
<p><li> Compute a gradient by hand: For a circuit with one qubit and \( f(\theta)=\langle0|R_y(\theta)^\dagger Z R_y(\theta)|0\rangle \), use the parameter-shift rule to compute \( df/d\theta \).</li>
2209-
<p><li> Explore barren plateaus: Numerically evaluate \( \partial f/\partial\theta \) for a simple 5-qubit random circuit as depth increases. Observe the trend of gradient norms. What does this suggest?</li>
2218+
<p><li> Compute a gradient by hand: For a circuit with one qubit and \( f(\Theta)=\langle0|R_y(\Theta)^\dagger Z R_y(\Theta)|0\rangle \), use the parameter-shift rule to compute \( df/d\Theta \).</li>
2219+
<p><li> Explore barren plateaus: Numerically evaluate \( \partial f/\partial\Theta \) for a simple 5-qubit random circuit as depth increases. Observe the trend of gradient norms. What does this suggest?</li>
22102220
<p><li> Optimizer effects: Implement a small QNN (2 qubits) and train with both SGD and Adam optimizers. Compare convergence speed.</li>
22112221
</ol>
22122222
</section>

0 commit comments

Comments
 (0)