Skip to content

Commit 19032f8

Browse files
committed
update week 16
1 parent 70d587e commit 19032f8

File tree

9 files changed

+1132
-406
lines changed

9 files changed

+1132
-406
lines changed

doc/pub/week16/html/week16-bs.html

Lines changed: 185 additions & 32 deletions
Large diffs are not rendered by default.

doc/pub/week16/html/week16-reveal.html

Lines changed: 172 additions & 33 deletions
Large diffs are not rendered by default.

doc/pub/week16/html/week16-solarized.html

Lines changed: 180 additions & 31 deletions
Large diffs are not rendered by default.

doc/pub/week16/html/week16.html

Lines changed: 180 additions & 31 deletions
Large diffs are not rendered by default.
0 Bytes
Binary file not shown.

doc/pub/week16/ipynb/week16.ipynb

Lines changed: 281 additions & 152 deletions
Large diffs are not rendered by default.

doc/pub/week16/pdf/week16.pdf

37.7 KB
Binary file not shown.

doc/src/week15/figures/vqrbm.png

1.68 MB
Loading

doc/src/week16/week16.do.txt

Lines changed: 134 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -163,14 +163,15 @@ which leads to
163163

164164
!split
165165
===== Expression for the gradients =====
166+
166167
This leads to the following equation
167168
!bt
168169
\[
169170
\nabla_{\bm{\Theta}}\log{p(\bm{X};\bm{\Theta})}=\nabla_{\bm{\Theta}}\left(\sum_{x_i\in \bm{X}}\log{f(x_i;\bm{\Theta})}\right)-\nabla_{\bm{\Theta}}\log{Z(\bm{\Theta})}=0.
170171
\]
171172
!et
172173

173-
The first term is called the positive phase and we assume that we have a model for the function $f$ from which we can sample values. Below we will develop an explicit model for this.
174+
The first term is called the positive phase and we assume that we have a model for the function $f$ from which we can sample values.
174175
The second term is called the negative phase and is the one which leads to more difficulties.
175176

176177
!split
@@ -184,35 +185,11 @@ Z(\bm{\Theta})=\sum_{x_i\in \bm{X}}\sum_{h_j\in \bm{H}} f(x_i,h_j;\bm{\Theta}),
184185
!et
185186
is in general the most problematic term. In principle both $x$ and $h$ can span large degrees of freedom, if not even infinitely many ones, and computing the partition function itself is often not desirable or even feasible. The above derivative of the partition function can however be written in terms of an expectation value which is in turn evaluated using Monte Carlo sampling and the theory of Markov chains, popularly shortened to MCMC (or just MC$^2$).
186187

187-
!split
188-
===== Explicit expression for the derivative =====
189-
We can rewrite
190-
!bt
191-
\[
192-
\nabla_{\bm{\Theta}}\log{Z(\bm{\Theta})}=\frac{\nabla_{\bm{\Theta}}Z(\bm{\Theta})}{Z(\bm{\Theta})},
193-
\]
194-
!et
195-
which reads in more detail
196-
!bt
197-
\[
198-
\nabla_{\bm{\Theta}}\log{Z(\bm{\Theta})}=\frac{\nabla_{\bm{\Theta}} \sum_{x_i\in \bm{X}}f(x_i;\bm{\Theta}) }{Z(\bm{\Theta})}.
199-
\]
200-
!et
201-
202-
We can rewrite the function $f$ (we have assumed that is larger or
203-
equal than zero) as $f=\exp{\log{f}}$. We can then reqrite the last
204-
equation as
205-
206-
!bt
207-
\[
208-
\nabla_{\bm{\Theta}}\log{Z(\bm{\Theta})}=\frac{ \sum_{x_i\in \bm{X}} \nabla_{\bm{\Theta}}\exp{\log{f(x_i;\bm{\Theta})}} }{Z(\bm{\Theta})}.
209-
\]
210-
!et
211188

212189
!split
213190
===== Final expression =====
214191

215-
Taking the derivative gives us
192+
Summarizing, we have
216193
!bt
217194
\[
218195
\nabla_{\bm{\Theta}}\log{Z(\bm{\Theta})}=\frac{ \sum_{x_i\in \bm{X}}f(x_i;\bm{\Theta}) \nabla_{\bm{\Theta}}\log{f(x_i;\bm{\Theta})} }{Z(\bm{\Theta})},
@@ -232,10 +209,28 @@ that is
232209
!et
233210

234211
This quantity is evaluated using Monte Carlo sampling, with Gibbs
235-
sampling as the standard sampling rule. Before we discuss the
236-
explicit algorithms, we need to remind ourselves about Markov chains
237-
and sampling rules like the Metropolis-Hastings algorithm and Gibbs
238-
sampling.
212+
sampling as the standard sampling rule.
213+
214+
215+
!split
216+
===== Kullback-Leibler divergence =====
217+
218+
The Kullback–Leibler (KL) divergence, labeled $D_{KL}$, measures how one probability distribution $p$ diverges from a second expected probability distribution $q$,
219+
that is
220+
!bt
221+
\[
222+
D_{KL}(p \| q) = \int_x p(x) \log \frac{p(x)}{q(x)} dx.
223+
\]
224+
!et
225+
226+
The KL-divergence $D_{KL}$ achieves the minimum zero when $p(x) == q(x)$ everywhere.
227+
228+
Note that the KL divergence is asymmetric. In cases where $p(x)$ is
229+
close to zero, but $q(x)$ is significantly non-zero, the $q$'s effect
230+
is disregarded. It could cause buggy results when we just want to
231+
measure the similarity between two equally important distributions.
232+
233+
239234

240235
!split
241236
===== Introducing the energy model =====
@@ -291,6 +286,11 @@ and
291286
\]
292287
!et
293288

289+
!split
290+
===== More details on Boltzmann machines =====
291+
292+
For a binary-binary model, these are the equations for the various gradients which are needed in the setup of a neural network.
293+
For more details see URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week11/ipynb/week11.ipynb"
294294

295295
!split
296296
===== Code example =====
@@ -565,8 +565,10 @@ parameter in H is encoded as a gate angle or circuit parameter. The
565565
gradient of a circuit expectation can be obtained by the
566566
parameter-shift rule or automatic differentiation.
567567

568+
!split
569+
===== Variational Quantum Boltzmann machines (VQBM) =====
568570

569-
571+
See whiteboard notes.
570572

571573
!split
572574
===== Implementation with PennyLane =====
@@ -661,110 +663,110 @@ Hamiltonian weights).
661663
!split
662664
===== More code examples =====
663665

664-
!bc pycod
665-
# Code example for a Quantum Boltzmann Machine (QBM) applied to a binary classification problem using PennyLane. This example uses a simplified dataset (XOR problem) and trains a parameterized quantum circuit to model the joint distribution of features and labels.
666+
This code defines a target distribution (e.g., classical binary data) using a Variational Quantum Boltzmann machines (VQBM).
667+
At each epoch it trains the model and samples the model and final computes the histogram probabilities.
666668

669+
!bc pycod
667670
import pennylane as qml
668671
from pennylane import numpy as np
672+
import matplotlib.pyplot as plt
673+
from matplotlib.animation import FuncAnimation
674+
from collections import Counter
675+
676+
# Config
677+
num_visible = 2
678+
num_hidden = 2
679+
num_qubits = num_visible + num_hidden
680+
epochs = 50
681+
shots = 1000
682+
683+
dev = qml.device("default.qubit", wires=num_qubits, shots=shots)
684+
685+
# Target data (biased toward '11' and '00')
686+
target_bitstrings = ['11', '11', '11', '00', '00', '01']
687+
target_counts = Counter(target_bitstrings)
688+
target_probs = {
689+
format(i, f'0{num_visible}b'): target_counts.get(format(i, f'0{num_visible}b'), 0) / len(target_bitstrings)
690+
for i in range(2**num_visible)
691+
}
692+
693+
694+
# Ansatz
695+
def vqbm_ansatz(params):
696+
for i in range(num_qubits):
697+
qml.RY(params[i], wires=i)
698+
for i in range(num_qubits - 1):
699+
qml.CNOT(wires=[i, i + 1])
700+
for i in range(num_qubits):
701+
qml.RZ(params[i + num_qubits], wires=i)
702+
703+
# Hamiltonian
704+
def generate_hamiltonian():
705+
coeffs = []
706+
observables = []
707+
for i in range(num_qubits):
708+
coeffs.append(np.random.uniform(-1, 1))
709+
observables.append(qml.PauliZ(wires=i))
710+
for i in range(num_qubits):
711+
for j in range(i + 1, num_qubits):
712+
coeffs.append(np.random.uniform(-1, 1))
713+
observables.append(qml.PauliZ(wires=i) @ qml.PauliZ(wires=j))
714+
return qml.Hamiltonian(coeffs, observables)
715+
716+
H = generate_hamiltonian()
669717

670-
# Define the target probabilities for the XOR dataset
671-
target_probs = np.zeros(8)
672-
target_indices = [0, 3, 5, 6] # Binary: 000, 011, 101, 110
673-
for idx in target_indices:
674-
target_probs[idx] = 0.25
675718

676-
# Quantum circuit configuration
677-
num_qubits = 3 # 2 features + 1 label
678-
dev = qml.device("default.qubit", wires=num_qubits)
719+
@qml.qnode(dev)
720+
def energy_expectation(params):
721+
vqbm_ansatz(params)
722+
return qml.expval(H)
679723

680724
@qml.qnode(dev)
681-
def circuit(params):
682-
# First rotation layer
683-
for i in range(num_qubits):
684-
qml.RX(params[0][i], wires=i)
685-
qml.RY(params[1][i], wires=i)
686-
687-
# Entangling gates
688-
qml.CNOT(wires=[0, 1])
689-
qml.CNOT(wires=[1, 2])
690-
qml.CNOT(wires=[0, 2])
691-
692-
# Second rotation layer
693-
for i in range(num_qubits):
694-
qml.RX(params[2][i], wires=i)
695-
qml.RY(params[3][i], wires=i)
696-
697-
return qml.probs(wires=range(num_qubits))
698-
699-
# Initialize parameters
700-
params = [
701-
np.random.uniform(0, 2*np.pi, size=num_qubits, requires_grad=True),
702-
np.random.uniform(0, 2*np.pi, size=num_qubits, requires_grad=True),
703-
np.random.uniform(0, 2*np.pi, size=num_qubits, requires_grad=True),
704-
np.random.uniform(0, 2*np.pi, size=num_qubits, requires_grad=True)
705-
]
706-
707-
# Cost function (KL divergence)
708-
def cost(params):
709-
model_probs = circuit(params)
710-
cost = 0.0
711-
for idx in target_indices:
712-
q = model_probs[idx]
713-
cost += 0.25 * (np.log(0.25) - np.log(q + 1e-10)) # Add small epsilon to avoid log(0)
714-
return cost
715-
716-
# Optimization
725+
def sample_circuit(params):
726+
vqbm_ansatz(params)
727+
return qml.sample(wires=range(num_visible))
728+
729+
# Helper: Convert samples to bitstring histogram
730+
def get_distribution(samples):
731+
bitstrings = ["".join(str(bit) for bit in s) for s in samples]
732+
counts = Counter(bitstrings)
733+
total = sum(counts.values())
734+
return {
735+
format(i, f'0{num_visible}b'): counts.get(format(i, f'0{num_visible}b'), 0) / total
736+
for i in range(2**num_visible)
737+
}
738+
739+
# Training and storing distributions
740+
params = 0.01 * np.random.randn(2 * num_qubits, requires_grad=True)
717741
opt = qml.AdamOptimizer(stepsize=0.1)
718-
max_iterations = 100
719-
720-
for i in range(max_iterations):
721-
params, current_cost = opt.step_and_cost(cost, params)
722-
if i % 10 == 0:
723-
print(f"Iteration {i+1}: Cost = {current_cost}")
724-
725-
# Prediction function
726-
def predict(features):
727-
# Calculate probabilities for both possible labels
728-
all_probs = circuit(params)
729-
feature_mask = (int(f"{features[0]}{features[1]}", 2) << 1)
730-
p0 = all_probs[feature_mask]
731-
p1 = all_probs[feature_mask | 1]
732-
total = p0 + p1
733-
return p0/total if total != 0 else 0.5, p1/total if total != 0 else 0.5
734-
735-
# Test the model
736-
test_cases = [(0,0), (0,1), (1,0), (1,1)]
737-
print("\nPredictions:")
738-
for features in test_cases:
739-
prob_0, prob_1 = predict(features)
740-
print(f"Features {features}: P(0)={prob_0:.2f}, P(1)={prob_1:.2f}")
741-
742-
"""
743-
**Key components explained:**
744-
745-
1. **Dataset**: Uses XOR problem with binary features and labels encoded in 3 qubits (2 features, 1 label).
746-
747-
2. **Quantum Circuit**:
748-
- Uses rotation gates (RX, RY) and entangling gates (CNOT)
749-
- Parameters are optimized to match the target distribution
750-
- Outputs probabilities for all possible 8 states
751-
752-
3. **Training**:
753-
- Minimizes KL divergence between model and target probabilities
754-
- Uses Adam optimizer for better convergence
755-
756-
4. **Prediction**:
757-
- Calculates conditional probabilities p(label|features) by marginalizing the joint distribution
758-
- Normalizes probabilities for classification
759-
760-
**Note:** This is a simplified example. For real-world applications, you would need to:
761-
1. Handle continuous features (e.g., using amplitude encoding)
762-
2. Use more sophisticated ansatz architectures
763-
3. Implement proper batching for larger datasets
764-
4. Add regularization to prevent overfitting
765-
766-
The output should show decreasing cost during training and high probabilities for correct labels in predictions. Actual results may vary due to random initialization and optimization challenges.
767-
"""
742+
history = []
743+
744+
for epoch in range(epochs):
745+
params = opt.step(energy_expectation, params)
746+
learned_dist = get_distribution(sample_circuit(params))
747+
history.append(learned_dist)
748+
if epoch % 10 == 0:
749+
print(f"Epoch {epoch} energy: {energy_expectation(params):.4f}")
750+
751+
# Animation setup
752+
states = [format(i, f'0{num_visible}b') for i in range(2**num_visible)]
753+
754+
fig, ax = plt.subplots()
755+
bar1 = ax.bar(states, [0]*len(states), color='skyblue', label="VQBM")
756+
bar2 = ax.bar(states, [target_probs[s] for s in states], color='orange', alpha=0.6, label="Target")
757+
ax.set_ylim(0, 1)
758+
ax.set_ylabel("Probability")
759+
ax.set_title("VQBM Learning Over Epochs")
760+
ax.legend()
761+
762+
def update(frame):
763+
dist = history[frame]
764+
for i, state in enumerate(states):
765+
bar1[i].set_height(dist[state])
766+
ax.set_title(f"Epoch {frame}")
767+
768+
ani = FuncAnimation(fig, update, frames=len(history), repeat=False)
769+
plt.show()
768770

769771
!ec
770772

@@ -782,3 +784,8 @@ o Define a cost function, such as the negative log-likelihood or a distance betw
782784
Use an optimizer (e.g. gradient descent, Adam) with PennyLane’s gradient calculations to update parameters.
783785

784786

787+
788+
789+
790+
791+

0 commit comments

Comments
 (0)