6363 ('Input gate', 2, None, 'input-gate'),
6464 ('Forget and input', 2, None, 'forget-and-input'),
6565 ('Output gate', 2, None, 'output-gate'),
66- ('Example: Solving Differential equations',
67- 2,
68- None,
69- 'example-solving-differential-equations'),
70- ('Lorenz attractor', 2, None, 'lorenz-attractor'),
71- ('Generating data', 2, None, 'generating-data'),
72- ('Training and testing', 2, None, 'training-and-testing'),
73- ('Computationally expensive',
74- 2,
75- None,
76- 'computationally-expensive'),
77- ('Choice of training data', 2, None, 'choice-of-training-data'),
78- ('Cost/Loss function', 2, None, 'cost-loss-function'),
79- ('Modifying the cost/loss function, adding more info',
80- 2,
81- None,
82- 'modifying-the-cost-loss-function-adding-more-info'),
83- ('Changing the function to optimize',
84- 2,
85- None,
86- 'changing-the-function-to-optimize'),
87- ('Adding more information to the loss function',
88- 2,
89- None,
90- 'adding-more-information-to-the-loss-function'),
9166 ('Autoencoders: Overarching view',
9267 2,
9368 None,
206181 <!-- navigation toc: --> < li > < a href ="#input-gate " style ="font-size: 80%; "> Input gate</ a > </ li >
207182 <!-- navigation toc: --> < li > < a href ="#forget-and-input " style ="font-size: 80%; "> Forget and input</ a > </ li >
208183 <!-- navigation toc: --> < li > < a href ="#output-gate " style ="font-size: 80%; "> Output gate</ a > </ li >
209- <!-- navigation toc: --> < li > < a href ="#example-solving-differential-equations " style ="font-size: 80%; "> Example: Solving Differential equations</ a > </ li >
210- <!-- navigation toc: --> < li > < a href ="#lorenz-attractor " style ="font-size: 80%; "> Lorenz attractor</ a > </ li >
211- <!-- navigation toc: --> < li > < a href ="#generating-data " style ="font-size: 80%; "> Generating data</ a > </ li >
212- <!-- navigation toc: --> < li > < a href ="#training-and-testing " style ="font-size: 80%; "> Training and testing</ a > </ li >
213- <!-- navigation toc: --> < li > < a href ="#computationally-expensive " style ="font-size: 80%; "> Computationally expensive</ a > </ li >
214- <!-- navigation toc: --> < li > < a href ="#choice-of-training-data " style ="font-size: 80%; "> Choice of training data</ a > </ li >
215- <!-- navigation toc: --> < li > < a href ="#cost-loss-function " style ="font-size: 80%; "> Cost/Loss function</ a > </ li >
216- <!-- navigation toc: --> < li > < a href ="#modifying-the-cost-loss-function-adding-more-info " style ="font-size: 80%; "> Modifying the cost/loss function, adding more info</ a > </ li >
217- <!-- navigation toc: --> < li > < a href ="#changing-the-function-to-optimize " style ="font-size: 80%; "> Changing the function to optimize</ a > </ li >
218- <!-- navigation toc: --> < li > < a href ="#adding-more-information-to-the-loss-function " style ="font-size: 80%; "> Adding more information to the loss function</ a > </ li >
219184 <!-- navigation toc: --> < li > < a href ="#autoencoders-overarching-view " style ="font-size: 80%; "> Autoencoders: Overarching view</ a > </ li >
220185 <!-- navigation toc: --> < li > < a href ="#powerful-detectors " style ="font-size: 80%; "> Powerful detectors</ a > </ li >
221186 <!-- navigation toc: --> < li > < a href ="#first-introduction-of-aes " style ="font-size: 80%; "> First introduction of AEs</ a > </ li >
@@ -294,7 +259,6 @@ <h2 id="plans-for-the-week-march-10-14" class="anchor">Plans for the week March
294259<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
295260< ol >
296261< li > RNNs and discussion of Long-Short-Term memory</ li >
297- < li > Example of application of RNNs to differential equations</ li >
298262< li > Start discussion of Autoencoders (AEs)</ li >
299263< li > Links between Principal Component Analysis (PCA) and AE
300264<!-- o Discussion of specific examples relevant for project 1, <a href="https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/Projects/2023/ProjectExamples/RNNs.pdf" target="_self">see project from last year by Daniel and Keran</a> --> </ li >
@@ -312,7 +276,7 @@ <h2 id="reading-recommendations-rnns-and-lstms" class="anchor">Reading recommend
312276< ol >
313277< li > For RNNs see Goodfellow et al chapter 10, see < a href ="https://www.deeplearningbook.org/contents/rnn.html " target ="_self "> < tt > https://www.deeplearningbook.org/contents/rnn.html</ tt > </ a > </ li >
314278< li > Reading suggestions for implementation of RNNs in PyTorch: Rashcka et al's text, chapter 15</ li >
315- < li > RNN video at URL": https://youtu.be/PCgrgHgy26c?feature=shared"</ li >
279+ < li > RNN video at < a href =" https://youtu.be/PCgrgHgy26c?feature=shared " target =" _self " > < tt > https://youtu.be/PCgrgHgy26c?feature=shared </ tt > </ a > </ li >
316280< li > New xLSTM, see Beck et al < a href ="https://arxiv.org/abs/2405.04517 " target ="_self "> < tt > https://arxiv.org/abs/2405.04517</ tt > </ a > . Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.</ li >
317281</ ol >
318282</ div >
@@ -484,177 +448,6 @@ <h2 id="output-gate" class="anchor">Output gate </h2>
484448
485449< p > where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</ p >
486450
487- <!-- !split -->
488- < h2 id ="example-solving-differential-equations " class ="anchor "> Example: Solving Differential equations </ h2 >
489-
490- < p > The dynamics of a stable spiral evolve in such a way that the system's
491- trajectory converges to a fixed point while spiraling inward. These
492- oscillations around the fixed point are gradually dampened until the
493- system reaches a steady state at a fixed point. Suppose we have a
494- two-dimensional system of coupled differential equations of the form
495- </ p >
496- $$
497- \begin{align*}
498- \frac{dx}{dt} &= ax + by \notag,\\
499- \frac{dy}{dt} &= cx + dy.
500- \end{align*}
501- $$
502-
503- < p > The choice of \( a,b,c,d \in \mathbb{R} \) completely determines the
504- behavior of the solution, and for some of these values, albeit not
505- all, the system is said to be a stable spiral. This condition is
506- satisfied when the eigenvalues of the matrix formed by the
507- coefficients are complex conjugates with a negative real part.
508- </ p >
509-
510- <!-- !split -->
511- < h2 id ="lorenz-attractor " class ="anchor "> Lorenz attractor </ h2 >
512-
513- < p > A Lorenz attractor presents some added complexity. It exhbits what is called a chaotic
514- behavior and its behavior is extremely sensitive to initial conditions.
515- </ p >
516-
517- < p > The expression for the Lorenz attractor evolution consists of a set of three coupled nonlinear differential equations given by</ p >
518-
519- $$
520- \begin{align*}
521- \frac{dx}{dt} &= \sigma (y-x), \notag\\
522- \frac{dy}{dt} &= x(\rho -z) - y, \notag\\
523- \frac{dz}{dt} &= xy- \beta z.
524- \end{align*}
525- $$
526-
527- < p > For this problem, \( (x,y,z) \) are the variables that determine the state
528- of the system in the space while \( \sigma, \rho \) and \( \beta \) are,
529- similarly to the constants \( a,b,c,d \) of the stable spiral, parameters
530- that influence largely how the system evolves.
531- </ p >
532-
533- <!-- !split -->
534- < h2 id ="generating-data " class ="anchor "> Generating data </ h2 >
535-
536- < p > Both of the above-mentioned systems are governed by differential
537- equations, and as such, they can be solved numerically through some
538- integration scheme such as forward-Euler or fourth-order
539- Runge-Kutta.
540- </ p >
541-
542- < p > We use the common choice of parameters \( \sigma =10 \), \( \rho =28 \),
543- \( \beta =8/3 \). This choice generates complex and aesthetic
544- trajectories that have been extensively investigated and benchmarked
545- in the literature of numerical simulations.
546- </ p >
547-
548- < p > For the stable spiral, we employ \( a = 0.2 \), \( b = -1.0 \), \( c = 1.0 \), \( d = 0.2 \).
549- This gives a good number of oscillations before reaching a steady state.
550- </ p >
551-
552- <!-- !split -->
553- < h2 id ="training-and-testing " class ="anchor "> Training and testing </ h2 >
554-
555- < p > Training and testing procedures in recurrent neural networks follow
556- what is usual for regular FNNs, but some special consideration needs
557- to be taken into account due to the sequential character of the
558- data. < b > Training and testing batches must not be randomly shuffled</ b > for
559- it would clearly decorrelate the time-series points and leak future
560- information into present or past points of the model.
561- </ p >
562-
563- <!-- !split -->
564- < h2 id ="computationally-expensive " class ="anchor "> Computationally expensive </ h2 >
565-
566- < p > The training algorithm can become computationally
567- costly, especially if the losses are evaluated for all previous time
568- steps. While other architectures such as that of LSTMs can be used to
569- mitigate that, it is also possible to introduce another hyperparameter
570- responsible for controlling how much of the network will be unfolded
571- in the training process, adjusting how much the network will remember
572- from previous points in time . Similarly, the number of steps the network predicts
573- in the future per iteration greatly influences the assessment of the
574- loss function. .
575- </ p >
576-
577- <!-- !split -->
578- < h2 id ="choice-of-training-data " class ="anchor "> Choice of training data </ h2 >
579-
580- < p > The training and testing batches were separated into whole
581- trajectories. This means that instead of training and testing on
582- different fractions of the same trajectory, all trajectories that were
583- tested had completely new initial conditions. In this sense, from a
584- total of 10 initial conditions (independent trajectories), 9 were used
585- for training and 1 for testing. Each trajectory consisted of 800
586- points in each space coordinate.
587- </ p >
588-
589- <!-- !split -->
590- < h2 id ="cost-loss-function " class ="anchor "> Cost/Loss function </ h2 >
591-
592- < p > The problem we have is a time-series forecasting problem, so, we are
593- free to choose the loss function amongst the big collection of
594- regression losses. Using the mean-squared error of the predicted
595- versus factual trajectories of the dynamic systems is a natural choice.
596- </ p >
597-
598- < p > It is a convex
599- function, so given sufficient time and appropriate learning rates, it
600- is guaranteed to converge to global minima irrespective of the
601- weightss random initializations.
602- </ p >
603-
604- $$
605- \begin{align}
606- \mathcal{L}_{MSE} = \frac{1}{N}\sum_{i}^N (y(\mathbf{x}_i) - \hat{y}(\mathbf{x}_i, \mathbf{\theta}))^2
607- \label{_auto1}
608- \end{align}
609- $$
610-
611- < p > where \( \mathbf{\theta} \) represents the set of all parameters of the network, and \( \mathbf{x}_i \) are the input values</ p >
612-
613- <!-- !split -->
614- < h2 id ="modifying-the-cost-loss-function-adding-more-info " class ="anchor "> Modifying the cost/loss function, adding more info </ h2 >
615-
616- < p > A cost/loss function that is based on the observational and
617- predicted data, is normally referred to as a purely data-driven approach.
618- </ p >
619-
620- < p > While this is a
621- well-established way of assessing regressions, it does not make use of
622- other intuitions we might have over the problem we are trying to
623- solve. At the same time, it is a well-established fact that neural
624- network models are data-greedy - they need large amounts of data to be
625- able to generalize predictions outside the training set. One way to
626- try to mitigate this is by using physics-informed neural networks
627- (PINNs) when possible.
628- </ p >
629-
630- <!-- !split -->
631- < h2 id ="changing-the-function-to-optimize " class ="anchor "> Changing the function to optimize </ h2 >
632-
633- < p > Trying to improve the performance of our model beyond training sets,
634- PINNs then add physics-informed penalties to the loss function. In
635- essence, this means that we add a worse evaluation score to
636- predictions that do not respect physical laws we think our real data
637- should obey. This procedure often has the advantage of trimming the
638- parameter space without adding bias to the model if the constraints
639- imposed are correct, but the choice of the physical laws can be a
640- delicate one.
641- </ p >
642-
643- <!-- !split -->
644- < h2 id ="adding-more-information-to-the-loss-function " class ="anchor "> Adding more information to the loss function </ h2 >
645-
646- < p > A general way of expressing this added penalty to the loss function is shown here</ p >
647- $$
648- \begin{align*}
649- \mathcal{L} = w_{MSE}\mathcal{L}_{MSE} + w_{PI}\mathcal{L}_{PI}.
650- \end{align*}
651- $$
652-
653- < p > Here, the weights \( w_{MSE} \) and \( w_{PI} \) explicitly mediate how much
654- influence the specific parts of the total loss function should
655- contribute.
656- </ p >
657-
658451<!-- !split -->
659452< h2 id ="autoencoders-overarching-view " class ="anchor "> Autoencoders: Overarching view </ h2 >
660453
0 commit comments