RascalSoftware · alexhroom · Jan 21, 2025 · Jan 7, 2025 · Jan 9, 2025 · Jan 13, 2025
diff --git a/source/advanced/parallelisation.rst b/source/advanced/parallelisation.rst
@@ -4,8 +4,7 @@
 Parallelising Calculations
 ==========================
 
-RAT has in-built parallelisation for speeding up calculations. It will either parallelise over points and this is easily selectable from the controls class. In addition, it is possible to use the external Paramonte sampler
-to run Bayesian analysis in parallel using 'mpiexcec'
+RAT has in-built parallelisation for speeding up calculations. It will either parallelise over points or contrasts and this is easily selectable from the controls class.
 
 ************************
 Internal Parallelisation
@@ -47,6 +46,3 @@ The parallelisation scheme is chosen from the controls class:
 Generally speaking, unless you have an inordinate amount of points in your datafiles, the greatest speed increase always results from parallelising over contrasts. In fact, if the number of points in your data
 is relatively small, parallelising over points can even slow things down because of the extra calculation overhead! It is a good idea to verify which is fastest for a give problem at the start of an analysis.
 
-******************************************
-Parallel Bayes with MIPEXCEC and Paramonte
-******************************************
diff --git a/source/algorithms/DE.rst b/source/algorithms/DE.rst
@@ -5,78 +5,86 @@ Differential Evolution
 ======================
 
 Differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It is an
-example of a 'genetic algorithm', whereas the principles of Darwinian Evolution are used to 'evolve' the correct solution from an initial set of guesses.
+example of a `'genetic algorithm' <https://en.wikipedia.org/wiki/Genetic_algorithm>`_ , where the principles of Darwinian Evolution are used to 
+'evolve' the correct solution from an initial set of guesses.
 
 DE is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means DE does not 
 require the optimization problem to be differentiable, as is required by classic optimization methods. 
 
-DE optimizes a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to its simple formulae, 
+DE optimizes a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to simple formulae, 
 and then keeping whichever candidate solution has the best score or fitness on the optimization problem at hand. In this way, the optimization problem is treated as a black box 
 that merely provides a measure of quality given a candidate solution and the gradient is therefore not needed.
 
-As with all the RAT algorithms, DE is selected using the 'procedure' attribute of the controls block:-
+The way new candidate solutions are created is via taking two existing solutions, and 'mutating' them by adding a 'base vector' (usually random, see :ref:`Strategies<deStrategies>` below), a
+'difference vector' calculated via the difference between the two points in a randomly-chosen parameter, and performing 'crossover', where random parameters are
+copied between the parent solutions and the mutant (with a probability determined by ``crossoverProbability``, below).
 
-.. tab-set-code::
-    .. code-block:: Matlab
+The RAT implementation is based on a MATLAB implementation by Storn, Price, Neumaier and van Zandt, modified to be compilable to C++ using MATLAB Coder.
 
-        controls = controlsClass();
-        controls.procedure = 'DE'
+For more technical information, see the `Wikipedia page for differential evolution <https://en.wikipedia.org/wiki/Differential_evolution>`_. There is also
+an entire textbook on differential evolution by Price, Storn and Lampinen (2005). [#price2005]_
 
-    .. code-block:: Python
-    
-        controls = RAT.Controls(procedure='DE')
+Algorithm control parameters
+----------------------------
+The following parameters in the :ref:`Controls object<controlsInfo>` are specific to differential evolution:
 
-This reveals the DE specific parameters in controls:- 
+- ``populationSize``: The number of candidate solutions that exist at any given time.
+  The original MATLAB code (Storn et al.) suggests that this size is not particularly critical, and that
+  a good initial guess is the number of fit parameters multiplied by 10. 
 
-.. tab-set::
-    :class: tab-label-hidden
-    :sync-group: code
+- ``numGenerations``: The maximum number of iterations to run.
 
-    .. tab-item:: Matlab
-        :sync: Matlab
+- ``crossoverProbability``: The probability of exchange of parameters between individuals at each generation (value between 0 and 1).
+  The original MATLAB code (Storn et al.) suggests that this should be high for most practical applications; it should be higher for correlated parameters,
+  and lower for uncorrelated parameters.
 
-        .. output:: Matlab
+- ``fWeight``: A weighting value controlling the step size of mutations. The algorithm is quite sensitive to the choice of step size.
 
-            controls = controlsClass();
-            controls.procedure = 'DE';
-            disp(controls)
+  Storn, Price and Lampinen (2005) [#price2005]_ suggest 1 as an "empirically derived upper limit, in the sense that no function [they have seen] has required [fWeight] > 1".
+  Zaharie (2002) [#zaharie2002]_ calculates that population variance decreases (and so the population eventually becomes homogeneous) if
 
-    .. tab-item:: Python 
-        :sync: Python
-
-        .. output:: Python
+  .. math:: F < \sqrt{\frac{1 - P_{cr}/2}{N_p}}
 
-            controls = RAT.Controls(procedure='DE')
-            print(controls)
 
-For all the algorithms in the RAT implementation of DE (see below), the parameters have the following meanings:-
+  where :math:`P_{cr}` is the crossover probability (``crossoverProbability``) and :math:`N_p` is the population size (``populationSize``). This expression
+  gives a rule of thumb for a smallest 'good' value.
 
-* **populationSize**: For DE a number of sets of parameters (population) evolve using random mutation, or exchange of parameters 
-  (analogous to genes) between the members of the population.
+- ``strategy``: The algorithm used to generate new candidates (see :ref:`below<deStrategies>`).
 
-* **numGenerations**: How many iterations of DE to run.
+- ``targetValue``: The value of chi-squared to aim for. The algorithm will terminate if this is reached.
 
-* **crossoverProbability**: The probability of exchange of parameters between individuals at each generation (value between [0 - 1]).
+- ``updateFreq``: The number of iterations between printing progress updates to the terminal. 
 
-* **fWeight**: A weighting value controlling the step size of mutations.
+- ``updatePlotFreq``: The number of iterations between updates to live plots. 
 
-* **strategy**: The algorithm used (see below).
 
-* **Target Value**: The value of chi-squared to aim for (algorithm will terminate if this is reached).
+.. _deStrategies :
 
+Strategies
+----------
+The ``strategy`` control parameter selects between variations in the actual selection algorithm. New candidates are created 
+using a 'base vector' to generate 'mutants' from existing points and the strategy defines how this base vector is calculated.
+Each strategy has a shorthand notation used in the literature, which is given in parentheses. The options are:
 
-DE is also somewhat sensitive to the choice of the step size fWeight. A good initial guess is to
-choose fWeight from interval [0.5, 1], e.g. 0.8. The crossover probability (between 0 -1) helps to maintain
-the diversity of the population but should be close to 1 for most practical cases. Only separable problems do better with CR close to 0.
-If the parameters are correlated, high values of F_CR work better. The reverse is true for no correlation.
+#. **Random (DE/rand/1)**: The base vector is random.
+#. **Local-to-best (DE/local-to-best/1)**: The base vector is a combination of one randomly-selected local solution and the best solution of the previous iteration.
+   This approach tries to strike a balance between robustness and fast convergence.
+#. **Best with jitter (DE/best/1 with jitter)**: The base vector is the best solution of the previous iteration, with a small random perturbation applied.
+#. **Random with per-vector dither (DE/rand/1 with per-vector-dither)**: The base vector is random, with a random scaling factor applied to each mutant. This scaling
+   factor is different for each mutant.
+#. **Random with per-generation dither (DE/rand/1 with per-generation-dither)**: The base vector is random, with a random scaling factor applied to each mutant.
+   This scaling factor is the same for every mutant, and randomised every generation.
+#. **Random either-or algorithm (DE/rand/1 either-or-algorithm)**: The base vector is randomly chosen from either a pure random
+   mutation, or a pure recombination of parent parameter values.
 
-The number of population members I_NP is also not very critical. A good initial guess is 10*I_D.
 
-The 'strategy' selects between variations in the actual selection algorithm. The options are:-
-
-#. DE/rand/1
-#. DE/local-to-best/1
-#. DE/best/1 with jitter
-#. DE/rand/1 with per-vector-dither
-#. DE/rand/1 with per-generation-dither
-#. DE/rand/1 either-or-algorithm
+.. [#price2005] 
+   Price, Kenneth V.; Storn, Rainer M.; Lampinen, Jouni A. (2005),
+   "Differential Evolution: A Practical Approach to Global Optimisation".
+   ISBN: 978-3-540-20950-8,
+   URL: https://link.springer.com/book/10.1007/3-540-31306-0
+.. [#zaharie2002]
+   Zaharie, Daniela (2002),
+   "Critical values for the control parameters of differential evolution algorithms".
+   Journal: Proc. of MENDEL 2002, 8th Int. Conf. on Soft Computing 2002,
+   URL: https://staff.fmi.uvt.ro/~daniela.zaharie/mendel02.pdf
diff --git a/source/algorithms/DREAM.rst b/source/algorithms/DREAM.rst
@@ -4,19 +4,88 @@
 Bayesian Analysis - DREAM
 =========================
 
-The main Bayesian Sampler bundled with RAT is an implementation of the DREAM (DiffeRential Evolution Adaptive Metropolis)
-algorithm first described by `Vrugt`_. 
+The main Bayesian Sampler in RAT is a MATLAB implementation of the DREAM (DiffeRential Evolution Adaptive Metropolis)
+algorithm of Vrugt (2016), [#vrugt2016]_ modified to be compilable to C++ via MATLAB Coder.
 
+As the name suggests, DREAM combines :ref:`differential evolution<DE>` with a variant on the 
+`Metropolis-Hastings optimisation algorithm <https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm>`_.
 
+The basic idea of the Metropolis-Hastings algorithm is that it randomly "walks" around the parameter space, by taking
+a "step" of a fixed distance and either accepting the step or rejecting it (i.e. going back to the previous step and starting again)
+based on the following process:
 
+#. Let :math:`x` be the current point, :math:`x'` be the proposed new step, and :math:`f` the function that calculates chi-squared at any point.
+#. Calculate the ratio
 
+   .. math:: \alpha = f(x')/f(x).
 
+#. Generate a random number :math:`u` between 0 and 1.
+#. If :math:`u \leq \alpha`, accept the candidate. Otherwise, reject it.
 
+This means that while steps into points with better chi-squared are more likely, they are not *guaranteed*, and the algorithm is also able to step
+to points of worse fit (but this is less likely to be accepted). The sequence of steps is an example of a Markov chain. After a large number of steps
+we can infer various things about the shape of our probability distribution for each parameter. The algorithm can be adapted to, for example, adapt
+the size of the steps to shrink proportionally to the number of steps that have been accepted, with the assumption that the algorithm has found a minimum
+after many accepted steps.
 
+DREAM modifies this idea and combines it with differential evolution. It creates multiple Markov chains ("walks") in parallel,
+and the new proposed points are generated by using a differential evolution process between all the chains:
 
+#. Let :math:`v_i` be the vector representing chain :math:`i`; 
+   :math:`v_i = (x^i_1, x^i_2, x^i_3,...)` 
+   where :math:`x^i_j` is the :math:`j`'th point on chain :math:`i`.
+#. Let there be :math:`N` chains, and say we are on iteration :math:`t`.
+   Use differential evolution over all points on all chains to propose a new point for each chain:
+   :math:`x^1_t, x^2_t, ..., x^N_t`.
+#. For each point, either accept or reject using the same criteria as in Metropolis-Hastings. If
+   accepted, add it to its respective chain.
 
+This means that over time, DREAM creates :math:`N` parallel Markov chains that explore the parameter space,
+and each chain can 'learn' from better-performing chains via the differential evolution process. After some iterations,
+we start 'cutting off' the older end of the chain so that the new proposals are generated from a sample set of better points.
 
+It also performs several auxilliary improvements such as subspace sampling (i.e. generating proposals from only a subset of the parameters
+at a time), and a correction algorithm for chains that are outliers. It has been found to outperform other adaptive
+MCMC approaches, and in some cases even outperforms classical optimisation algorithms at parameter estimation. [#laloy2012]_
 
 
+Algorithm control parameters
+----------------------------
+The following parameters in the :ref:`Controls object<controlsInfo>` are specific to DREAM:
 
-.. _Vrugt: https://www.sciencedirect.com/science/article/abs/pii/S1364815215300396
+- ``nSamples``: The number of samples in the initial population for each chain.
+
+- ``nChains``: The number of Markov chains to use in the algorithm.
+
+- ``jumpProbability``: A probability range for the size of jumps when performing subspace sampling. Larger values mean
+  more random variation in jump sizes for new proposed values.
+
+- ``pUnitGamma``: The 'probability of unit jump' rate. New proposals are scaled down by a factor :math:`\gamma = 2.38/2d`,
+  where :math:`d` is the current length of the chain. This means that as the chain gets longer and 'better', jumps to new
+  points aren't as far from the rest of the chain (as we are likely close to a minimum of the space).
+  ``pUnitGamma`` is the probability that this will be ignored for an iteration
+  and instead :math:`\gamma = 1` for that iteration (hence the name: "p unit gamma", the probability that gamma
+  will be equal to the unit for an iteration). This allows chains to 'jump' out of their local area and more easily explore other
+  parts of the parameter space.
+
+- ``boundHandling``: How steps across the boundary of the space should be handled. Can be:
+
+  - ``"off"``: allow steps to pass the boundary. 
+  - ``"reflect"``: if a step passes a boundary, reflect it back across the boundary.
+  - ``"bound"``: if a step passes a boundary, set it equal to the nearest point on the boundary.
+  - ``"fold"``: treat the boundary as periodic and 'wrap the step around' to the other side of the space.
+
+- ``adaptPCR``: whether crossover probabilities for differential evolution should be adapted by the algorithm as it runs.
+
+
+.. [#vrugt2016] 
+   Vrugt, Jasper (2016),
+   "Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation".
+   DOI: 10.1016/j.envsoft.2015.08.013,
+   URL: https://www.sciencedirect.com/science/article/abs/pii/S1364815215300396
+
+.. [#laloy2012]
+   Laloy, Eric; Vrugt, Jasper (2012),
+   "High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing".
+   DOI: 10.1029/2011WR010608,
+   URL: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR010608
diff --git a/source/algorithms/index.rst b/source/algorithms/index.rst
@@ -5,8 +5,7 @@ Algorithms
 ==========
 
 RAT has 4 built-in algorithms for data fitting and analysis. The Nelder-Mead Simplex and Differential Evolution 
-are traditional minimisers, whereas the other two (DREAM and Nested Sampling) are Bayes samplers. The use
-of an external library (`Paramonte`_.) for Bayesian sampling is also supported.
+are traditional minimisers, whereas the other two (DREAM and Nested Sampling) are Bayes samplers. 
 
 
 .. toctree::
@@ -16,6 +15,5 @@ of an external library (`Paramonte`_.) for Bayesian sampling is also supported.
    DE
    DREAM
    nestedSampling
-   paramonte
 
-.. _Paramonte: https://www.cdslab.org/paramonte/
+.. .. _Paramonte: https://www.cdslab.org/paramonte/