@@ -382,28 +382,38 @@ <h2 id="what-will-we-need-in-the-case-of-a-quantum-computer">What will we need i
382382< div class ="alert alert-block alert-block alert-text-normal ">
383383< b > </ b >
384384< p >
385- < p > We will have to translate the classical data point \( \vec {x} \)
386- into a quantum datapoint \( \vert \Phi{(\vec {x})} \rangle \). This can
387- be achieved by a circuit \( \mathcal{U}_{\Phi(\vec {x})} \vert 0\rangle \).
385+ < p > We will have to translate the classical data point \( \boldsymbol {x} \)
386+ into a quantum datapoint \( \vert \Phi{(\boldsymbol {x})} \rangle \). This can
387+ be achieved by a circuit \( \mathcal{U}_{\Phi(\boldsymbol {x})} \vert 0\rangle \).
388388</ p >
389389
390390< p > Here \( \Phi() \) could be any classical function applied
391- on the classical data \( \vec {x} \).
391+ on the classical data \( \boldsymbol {x} \).
392392</ p >
393393</ div >
394394
395395< div class ="alert alert-block alert-block alert-text-normal ">
396396< b > </ b >
397397< p >
398- < p > We need a parameterized quantum circuit \( W(\theta ) \) that
398+ < p > We need a parameterized quantum circuit \( W(\Theta ) \) that
399399processes the data in a way that in the end we
400400can apply a measurement that returns a classical value \( -1 \) or
401- \( 1 \) for each classical input \( \vec {x} \) that indentifies the label
401+ \( 1 \) for each classical input \( \boldsymbol {x} \) that indentifies the label
402402of the classical data.
403403</ p >
404404</ div >
405405</ section >
406406
407+ < section >
408+ < h2 id ="parameterized-quantum-circuits "> Parameterized quantum circuits </ h2 >
409+
410+ < br /> < br />
411+ < center >
412+ < p > < img src ="figures/pqc.png " width ="900 " align ="bottom "> </ p >
413+ </ center >
414+ < br /> < br />
415+ </ section >
416+
407417< section >
408418< h2 id ="the-most-general-ansatz "> The most general ansatz </ h2 >
409419
@@ -412,7 +422,7 @@ <h2 id="the-most-general-ansatz">The most general ansatz </h2>
412422</ p >
413423< p > < br >
414424$$
415- W(\theta ) \mathcal{U}_{\Phi}(\vec {x}) \vert 0 \rangle.
425+ W(\Theta ) \mathcal{U}_{\Phi}(\boldsymbol {x}) \vert 0 \rangle.
416426$$
417427< p > < br >
418428
@@ -425,7 +435,7 @@ <h2 id="quantum-svm">Quantum SVM </h2>
425435< p > In the case of a quantum SVM we will only use the quantum feature maps</ p >
426436< p > < br >
427437$$
428- \mathcal{U}_{\Phi(\vec {x})},
438+ \mathcal{U}_{\Phi(\boldsymbol {x})},
429439$$
430440< p > < br >
431441
@@ -444,14 +454,14 @@ <h2 id="defining-the-quantum-kernel">Defining the Quantum Kernel </h2>
444454</ p >
445455< p > < br >
446456$$
447- K(\vec {x}, \vec {z}) = \vert \langle \Phi (\vec {x}) \vert \Phi(\vec {z}) \rangle \vert^2 = \langle 0^n \vert \mathcal{U}_{\Phi(\vec {x})}^{t} \mathcal{U}_{\Phi(\vec {z})} \vert 0^n \rangle,
457+ K(\boldsymbol {x}, \boldsymbol {z}) = \vert \langle \Phi (\boldsymbol {x}) \vert \Phi(\boldsymbol {z}) \rangle \vert^2 = \langle 0^n \vert \mathcal{U}_{\Phi(\boldsymbol {x})}^{t} \mathcal{U}_{\Phi(\boldsymbol {z})} \vert 0^n \rangle,
448458$$
449459< p > < br >
450460
451461< p > but now with the quantum feature maps</ p >
452462< p > < br >
453463$$
454- \mathcal{U}_{\Phi(\vec {x})}.
464+ \mathcal{U}_{\Phi(\boldsymbol {x})}.
455465$$
456466< p > < br >
457467
@@ -839,8 +849,8 @@ <h2 id="quantum-neural-network">Quantum neural network </h2>
839849kernel machines with a particular kernel determined by the circuit .
840850In fact, one can often find a kernel SVM that matches or outperforms
841851the variational model. In practice, one can combine these: use a
842- trainable quantum embedding \( U(\boldsymbol{x};\theta ) \) with tunable
843- parameters \( \theta \), and optimize \( \theta \) to maximize the SVM
852+ trainable quantum embedding \( U(\boldsymbol{x};\Theta ) \) with tunable
853+ parameters \( \Theta \), and optimize \( \Theta \) to maximize the SVM
844854classification accuracy. This is called a quantum kernel learning
845855approach.
846856</ p >
@@ -1848,8 +1858,8 @@ <h2 id="variational-quantum-circuits">Variational Quantum Circuits </h2>
18481858optimizer . In this framework, a Variational Quantum Circuit (VQC)
18491859typically has three parts : (i) a state preparation or feature map
18501860that encodes classical input \( \mathbf{x} \) into a quantum state; (ii) a
1851- parameterized circuit \( W(\boldsymbol\theta ) \) (often called the ansatz)
1852- that depends on trainable parameters \( \boldsymbol\theta \); and (iii) a
1861+ parameterized circuit \( W(\boldsymbol\Theta ) \) (often called the ansatz)
1862+ that depends on trainable parameters \( \boldsymbol\Theta \); and (iii) a
18531863measurement that extracts a classical output from the final quantum
18541864state.
18551865</ p >
@@ -1868,20 +1878,20 @@ <h2 id="setting-up-a-vqc">Setting up a VQC </h2>
18681878
18691879< p > where \( U(\mathbf{x}) \) is a unitary (possibly composed of rotations)
18701880that depends on the data. We then apply the variational circuit
1871- \( W(\boldsymbol\theta ) \), often built as a product of layers
1872- \( V_j(\theta_j ) \), so that the final state is
1881+ \( W(\boldsymbol\Theta ) \), often built as a product of layers
1882+ \( V_j(\Theta_j ) \), so that the final state is
18731883</ p >
18741884
18751885< p > < br >
18761886$$
1877- \vert \Psi(\mathbf{x};\boldsymbol\theta )\rangle = W(\boldsymbol\theta ),U(\mathbf{x}),|0\rangle^{\otimes n}.
1887+ \vert \Psi(\mathbf{x};\boldsymbol\Theta )\rangle = W(\boldsymbol\Theta ),U(\mathbf{x}),|0\rangle^{\otimes n}.
18781888$$
18791889< p > < br >
18801890
18811891< p > For instance, one common ansatz is the hardware-efficient circuit:
18821892layers of parameterized single-qubit rotations and entangling gates
18831893(like CNOTs) repeated several times. The structure of
1884- \( W(\boldsymbol\theta ) \) can dramatically affect the circuit’s
1894+ \( W(\boldsymbol\Theta ) \) can dramatically affect the circuit’s
18851895expressivity and trainability.
18861896</ p >
18871897</ section >
@@ -1895,21 +1905,21 @@ <h2 id="outputs">Outputs </h2>
18951905</ p >
18961906< p > < br >
18971907$$
1898- f_k(\mathbf{x};\boldsymbol\theta ) ;=; \langle \Psi(\mathbf{x};\boldsymbol\theta ) | \hat B_k | \Psi(\mathbf{x};\boldsymbol\theta )\rangle.
1908+ f_k(\mathbf{x};\boldsymbol\Theta ) ;=; \langle \Psi(\mathbf{x};\boldsymbol\Theta ) | \hat B_k | \Psi(\mathbf{x};\boldsymbol\Theta )\rangle.
18991909$$
19001910< p > < br >
19011911
19021912< p > Equivalently, with</ p >
19031913< p > < br >
19041914$$
1905- \vert \Psi(\mathbf{x};\boldsymbol\theta )\rangle = W(\boldsymbol\theta )U(\mathbf{x})|0\rangle,
1915+ \vert \Psi(\mathbf{x};\boldsymbol\Theta )\rangle = W(\boldsymbol\Theta )U(\mathbf{x})|0\rangle,
19061916$$
19071917< p > < br >
19081918
19091919< p > one has</ p >
19101920< p > < br >
19111921$$
1912- f_k(\mathbf{x};\boldsymbol\theta ) = \langle 0|U(\mathbf{x})^\dagger W(\boldsymbol\theta )^\dagger ,\hat B_k, W(\boldsymbol\theta ) U(\mathbf{x}),|0\rangle.
1922+ f_k(\mathbf{x};\boldsymbol\Theta ) = \langle 0|U(\mathbf{x})^\dagger W(\boldsymbol\Theta )^\dagger ,\hat B_k, W(\boldsymbol\Theta ) U(\mathbf{x}),|0\rangle.
19131923$$
19141924< p > < br >
19151925
@@ -1920,9 +1930,9 @@ <h2 id="outputs">Outputs </h2>
19201930< h2 id ="short-summary "> Short summary </ h2 >
19211931
19221932< p > In summary, a variational quantum model
1923- \( f(\mathbf{x};\boldsymbol\theta ) \) maps inputs to outputs via the
1933+ \( f(\mathbf{x};\boldsymbol\Theta ) \) maps inputs to outputs via the
19241934hybrid quantum-classical procedure. During training, the classical
1925- optimizer adjusts \( \boldsymbol\theta \) (e.g. by gradient descent) to
1935+ optimizer adjusts \( \boldsymbol\Theta \) (e.g. by gradient descent) to
19261936minimize a cost function (like mean-squared error) defined on a
19271937dataset. Because the mapping is inherently quantum, these models can,
19281938in principle, harness the high-dimensional Hilbert space for richer
@@ -1944,21 +1954,21 @@ <h2 id="mathematical-example">Mathematical example </h2>
19441954< p > and a variational layer is</ p >
19451955< p > < br >
19461956$$
1947- V(\boldsymbol\theta )=R_y(\theta_1 )\otimes R_y(\theta_2 ),\mathrm{CNOT}(0,1),
1957+ V(\boldsymbol\Theta )=R_y(\Theta_1 )\otimes R_y(\Theta_2 ),\mathrm{CNOT}(0,1),
19481958$$
19491959< p > < br >
19501960
19511961< p > (apply \( R_y \) on each qubit then entangle). After
1952- applying \( W(\boldsymbol\theta )=V(\boldsymbol\theta ) \) to \( |0,0\rangle \),
1962+ applying \( W(\boldsymbol\Theta )=V(\boldsymbol\Theta ) \) to \( |0,0\rangle \),
19531963we measure \( \hat B=Z\otimes I \) on qubit 0. The output is
19541964</ p >
19551965< p > < br >
19561966$$
1957- f(\mathbf{x};\boldsymbol\theta ) = \langle 0,0|,U(\mathbf{x})^\dagger,V(\boldsymbol\theta )^\dagger, (Z\otimes I), V(\boldsymbol\theta ),U(\mathbf{x}),|0,0\rangle.
1967+ f(\mathbf{x};\boldsymbol\Theta ) = \langle 0,0|,U(\mathbf{x})^\dagger,V(\boldsymbol\Theta )^\dagger, (Z\otimes I), V(\boldsymbol\Theta ),U(\mathbf{x}),|0,0\rangle.
19581968$$
19591969< p > < br >
19601970
1961- < p > This \( f(x;\theta ) \) is then compared to the target in a cost function for optimization.</ p >
1971+ < p > This \( f(x;\Theta ) \) is then compared to the target in a cost function for optimization.</ p >
19621972</ section >
19631973
19641974< section >
@@ -1971,15 +1981,15 @@ <h2 id="key-elements">Key elements </h2>
19711981on each qubit , while more complex feature maps may exploit
19721982entanglement. The circuit output is obtained via expectation values
19731983of observables (e.g. Pauli-Z), yielding a differentiable function
1974- \( f(\mathbf{x};\boldsymbol\theta ) \) .
1984+ \( f(\mathbf{x};\boldsymbol\Theta ) \) .
19751985</ p >
19761986</ section >
19771987
19781988< section >
19791989< h2 id ="test-yourself-exercises "> Test yourself exercises </ h2 >
19801990
19811991< ol >
1982- < p > < li > Compute the state \( |\Psi(\mathbf{x};\boldsymbol\theta )\rangle \) explicitly for a 1-qubit VQC with \( U(x)=R_x(x) \) and \( W(\theta )=R_y(\theta ) \). What is \( \langle Z\rangle \) as a function of \( x,\theta \)?</ li >
1992+ < p > < li > Compute the state \( |\Psi(\mathbf{x};\boldsymbol\Theta )\rangle \) explicitly for a 1-qubit VQC with \( U(x)=R_x(x) \) and \( W(\Theta )=R_y(\Theta ) \). What is \( \langle Z\rangle \) as a function of \( x,\Theta \)?</ li >
19831993< p > < li > Draw (or describe) a hardware-efficient ansatz for 3 qubits with 2 layers of rotations and CNOTs. How many parameters does it have?</ li >
19841994</ ol >
19851995< p >
@@ -2029,7 +2039,7 @@ <h2 id="input-encoding">Input Encoding </h2>
20292039< h2 id ="qnn-architecture-and-models "> QNN Architecture and Models </ h2 >
20302040
20312041< p > A general QNN can be viewed as a parameterized unitary
2032- \( U(\mathbf{x},\boldsymbol\theta ) \) acting on \( n \) qubits, followed by
2042+ \( U(\mathbf{x},\boldsymbol\Theta ) \) acting on \( n \) qubits, followed by
20332043measurements. Fig. 2 (placeholder) might depict a generic QNN with
20342044several layers of trainable gates. Each layer can entangle qubits,
20352045building up complexity. The output is then a (classical) vector of
@@ -2045,8 +2055,8 @@ <h2 id="a-simple-feedforward-qnn-structure">A simple feedforward QNN structure <
20452055< p >
20462056< ol >
20472057< p > < li > Embedding Layer: Convert \( \mathbf{x} \) to \( |0\rangle^{\otimes n} \) via \( U(\mathbf{x}) \).</ li >
2048- < p > < li > Variational Layers: Repeat \( L \) blocks of parameterized gates \( W(\boldsymbol\theta ^{(l)}) \) (each block may act on all or subsets of qubits).</ li >
2049- < p > < li > Measurement: Measure selected qubits or observables to obtain the output predictions \( f(\mathbf{x};\boldsymbol\theta ) \).</ li >
2058+ < p > < li > Variational Layers: Repeat \( L \) blocks of parameterized gates \( W(\boldsymbol\Theta ^{(l)}) \) (each block may act on all or subsets of qubits).</ li >
2059+ < p > < li > Measurement: Measure selected qubits or observables to obtain the output predictions \( f(\mathbf{x};\boldsymbol\Theta ) \).</ li >
20502060</ ol >
20512061</ div >
20522062</ section >
@@ -2055,8 +2065,8 @@ <h2 id="a-simple-feedforward-qnn-structure">A simple feedforward QNN structure <
20552065< h2 id ="example "> Example </ h2 >
20562066
20572067< p > For example, a 2-layer QNN on 2 qubits might apply encoding
2058- \( R_x(x_1)\otimes R_x(x_2) \), then apply \( W(\theta ^{(1)}) \), then again
2059- encoding (or not), then \( W(\theta ^{(2)}) \), and finally measure. In
2068+ \( R_x(x_1)\otimes R_x(x_2) \), then apply \( W(\Theta ^{(1)}) \), then again
2069+ encoding (or not), then \( W(\Theta ^{(2)}) \), and finally measure. In
20602070classification tasks, one typically assigns a label based on the sign
20612071of \( \langle Z\rangle \) or uses multiple measurements for multi-class
20622072outputs.
@@ -2074,19 +2084,19 @@ <h2 id="example">Example </h2>
20742084< section >
20752085< h2 id ="training-output-and-cost-loss-function "> Training output and Cost/Loss-function </ h2 >
20762086
2077- < p > Given a QNN with output \( f(\mathbf{x};\boldsymbol\theta ) \) (a real
2087+ < p > Given a QNN with output \( f(\mathbf{x};\boldsymbol\Theta ) \) (a real
20782088number or vector of real values), one must define a loss function to
20792089train on data. Common choices are the mean squared error (MSE) for
20802090regression or cross-entropy for classification. For a training set
20812091\( {\mathbf{x}i,y_i} \), the MSE cost/loss-function is
20822092</ p >
20832093< p > < br >
20842094$$
2085- C(\boldsymbol\theta ) = \frac{1}{N} \sum_{i=1}^N \bigl(f(\mathbf{x}i;\boldsymbol\theta ) - y_i\bigr)^2.
2095+ C(\boldsymbol\Theta ) = \frac{1}{N} \sum_{i=1}^N \bigl(f(\mathbf{x}i;\boldsymbol\Theta ) - y_i\bigr)^2.
20862096$$
20872097< p > < br >
20882098
2089- < p > One then computes gradients \( \nabla{\boldsymbol\theta }C \) and updates
2099+ < p > One then computes gradients \( \nabla{\boldsymbol\Theta }C \) and updates
20902100parameters via gradient descent or other optimizers.
20912101</ p >
20922102</ section >
@@ -2095,7 +2105,7 @@ <h2 id="training-output-and-cost-loss-function">Training output and Cost/Loss-fu
20952105< h2 id ="exampe-variational-classifier "> Exampe: Variational Classifier </ h2 >
20962106
20972107< p > A binary classifier can output
2098- \( f(\mathbf{x};\boldsymbol\theta )=\langle Z_0\rangle \) on qubit 0, and
2108+ \( f(\mathbf{x};\boldsymbol\Theta )=\langle Z_0\rangle \) on qubit 0, and
20992109predict label \( +1 \) if \( f\ge0 \), else \( -1 \).
21002110</ p >
21012111</ section >
@@ -2136,15 +2146,15 @@ <h2 id="training-qnns-and-loss-landscapes">Training QNNs and Loss Landscapes </h
21362146< section >
21372147< h2 id ="gradient-computation "> Gradient Computation </ h2 >
21382148
2139- < p > Gradients \( \partial f/\partial\theta_j \) are obtained using the parameter-shift rule. For many gates \( e^{-i\theta P/2} \) (with \( P \) a Pauli), one can compute</ p >
2149+ < p > Gradients \( \partial f/\partial\Theta_j \) are obtained using the parameter-shift rule. For many gates \( e^{-i\Theta P/2} \) (with \( P \) a Pauli), one can compute</ p >
21402150< p > < br >
21412151$$
2142- \frac{\partial}{\partial\theta }\langle B\rangle
2143- = \frac{1}{2}\Bigl[\langle B\rangle_{\theta +\pi/2} - \langle B\rangle_{\theta -\pi/2}\Bigr],
2152+ \frac{\partial}{\partial\Theta }\langle B\rangle
2153+ = \frac{1}{2}\Bigl[\langle B\rangle_{\Theta +\pi/2} - \langle B\rangle_{\Theta -\pi/2}\Bigr],
21442154$$
21452155< p > < br >
21462156
2147- < p > where \( \langle B\rangle_{\theta \pm\pi/2} \) are expectation values
2157+ < p > where \( \langle B\rangle_{\Theta \pm\pi/2} \) are expectation values
21482158evaluated at shifted parameter values. This formula allows exact
21492159gradients by two circuit evaluations per parameter (independent of
21502160circuit size). PennyLane automatically applies parameter-shift rule
@@ -2183,7 +2193,7 @@ <h2 id="barren-plateaus">Barren Plateaus </h2>
21832193< section >
21842194< h2 id ="cost-loss-landscape-visualization "> Cost/Loss-landscape visualization </ h2 >
21852195
2186- < p > One can imagine the cost/loss function \( C(\boldsymbol\theta ) \) over the
2196+ < p > One can imagine the cost/loss function \( C(\boldsymbol\Theta ) \) over the
21872197parameter space. Unlike convex classical problems, this landscape may
21882198have many local minima and saddle points. Barren plateaus correspond
21892199to regions where \( \nabla C\approx 0 \) almost everywhere. Even if
@@ -2205,8 +2215,8 @@ <h2 id="cost-loss-landscape-visualization">Cost/Loss-landscape visualization </h
22052215< h2 id ="exercises "> Exercises </ h2 >
22062216
22072217< ol >
2208- < p > < li > Compute a gradient by hand: For a circuit with one qubit and \( f(\theta )=\langle0|R_y(\theta )^\dagger Z R_y(\theta )|0\rangle \), use the parameter-shift rule to compute \( df/d\theta \).</ li >
2209- < p > < li > Explore barren plateaus: Numerically evaluate \( \partial f/\partial\theta \) for a simple 5-qubit random circuit as depth increases. Observe the trend of gradient norms. What does this suggest?</ li >
2218+ < p > < li > Compute a gradient by hand: For a circuit with one qubit and \( f(\Theta )=\langle0|R_y(\Theta )^\dagger Z R_y(\Theta )|0\rangle \), use the parameter-shift rule to compute \( df/d\Theta \).</ li >
2219+ < p > < li > Explore barren plateaus: Numerically evaluate \( \partial f/\partial\Theta \) for a simple 5-qubit random circuit as depth increases. Observe the trend of gradient norms. What does this suggest?</ li >
22102220< p > < li > Optimizer effects: Implement a small QNN (2 qubits) and train with both SGD and Adam optimizers. Compare convergence speed.</ li >
22112221</ ol >
22122222</ section >
0 commit comments