RascalSoftware · alexhroom · Jan 14, 2025 · Dec 13, 2024 · Dec 13, 2024 · Dec 13, 2024
diff --git a/RATapi/examples/bayes_benchmark/bayes_benchmark.ipynb b/RATapi/examples/bayes_benchmark/bayes_benchmark.ipynb
@@ -0,0 +1,367 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bayesian parameter estimation for low-dimensional examples\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Bayesian procedures available in RAT ([Nested Sampler (NS)](https://en.wikipedia.org/wiki/Nested_sampling_algorithm) and [DREAM](https://doi.org/10.1016/j.envsoft.2015.08.013)) estimate the parameters of our reflectivity model - that is, they find the maximum value of \n",
+    "$$P((X_1, X_2, ...) = (x_1, x_2, ...) | I)$$\n",
+    "where $P$ is a probability measure, $X_1, X_2, ...$ are our parameters, $x_1, x_2, ...$ are proposed values of these parameters, and $I$ is our background information on the model (e.g. our data), over a range of values of $x_1, x_2, ...$ between given minimum and maximum values. It can be shown that under some weak assumptions about our data, this probability is proportional to $\\exp(-\\chi^2/2)$, where $\\chi^2$ is the chi-squared measure of fit given by the least-squares solution. [1]\n",
+    "\n",
+    "If we want to calculate $\\chi^2$ directly for a sample of $N$ values between some given minimum and maximum values for each parameter, we would have to perform $N^P$ calculations, where $P$ is the number of parameters. Of course, for large numbers of parameters, this is infeasible, hence why algorithms such as NS and DREAM have been developed to do so more efficiently. However, for a small number of parameters, it is feasible for us to perform this direct calculation and compare it to the results of NS and DREAM. Here we will do so for an example of 2 and 3 parameters.\n",
+    "\n",
+    "*[1] D. S. Sivia, J. R. P. Webster,\n",
+    "    \"The Bayesian approach to reflectivity data\",\n",
+    "    Physica B: Condensed Matter,\n",
+    "    Volume 248, June 1998, pages 327-337  \n",
+    "    DOI: 10.1016/S0921-4526(98)00259-2  \n",
+    "    URL: https://bayes.wustl.edu/sivia/98_20feb03.pdf*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Two-parameter example\n",
+    "We will start with a two-dimensional example on a simple project. This project represents a bare D2O substrate, and we will estimate the true values of the substrate roughness and background signal for this project."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "import RATapi as RAT\n",
+    "from RATapi.models import Parameter, Background, Resolution, Data, Contrast"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project = RAT.Project(\n",
+    "        name=\"Bare D2O Substrate\",\n",
+    "        calculation=\"normal\",\n",
+    "        model=\"standard layers\",\n",
+    "        geometry=\"air/substrate\",\n",
+    "        absorption=\"False\",\n",
+    "        parameters=[Parameter(name=\"Substrate Roughness\", min=3.0, value=4.844363132849221, max=8.0, fit=True)],\n",
+    "        background_parameters=[\n",
+    "            Parameter(name=\"Background parameter 1\", min=5e-08, value=3.069003361230152e-06, max=7e-06, fit=True)\n",
+    "        ],\n",
+    "        scalefactors=[Parameter(name=\"Scalefactor 1\", min=0.07, value=0.10141560336360426, max=0.13, fit=False)],\n",
+    "        bulk_in=[Parameter(name=\"Air\", min=0.0, value=0.0, max=0.0, fit=False)],\n",
+    "        bulk_out=[Parameter(name=\"D2O\", min=6.3e-06, value=6.35e-06, max=6.4e-06, fit=False)],\n",
+    "        resolution_parameters=[Parameter(name=\"Resolution parameter 1\", min=0.01, value=0.03, max=0.05, fit=False)],\n",
+    "        backgrounds=[Background(name=\"Background 1\", type=\"constant\", value_1=\"Background parameter 1\")],\n",
+    "        resolutions=[Resolution(name=\"Resolution 1\", type=\"constant\", value_1=\"Resolution parameter 1\")],\n",
+    "        data=[\n",
+    "            Data(name=\"Simulation\", data=np.empty([0, 3]), simulation_range=[0.005, 0.7]),\n",
+    "            Data(\n",
+    "                name=\"f82395c\",\n",
+    "                data=np.array(\n",
+    "                    [\n",
+    "                        [4.8866e-02, 1.2343e-04, 1.3213e-06],\n",
+    "                        [5.1309e-02, 1.0063e-04, 1.0803e-06],\n",
+    "                        [5.3874e-02, 8.2165e-05, 8.8779e-07],\n",
+    "                        [5.6568e-02, 6.4993e-05, 7.2018e-07],\n",
+    "                        [5.9396e-02, 5.3958e-05, 6.0015e-07],\n",
+    "                        [6.2366e-02, 4.3590e-05, 5.0129e-07],\n",
+    "                        [6.5485e-02, 3.5780e-05, 4.1957e-07],\n",
+    "                        [6.8759e-02, 2.9130e-05, 3.5171e-07],\n",
+    "                        [7.2197e-02, 2.3481e-05, 3.0586e-07],\n",
+    "                        [7.5807e-02, 1.8906e-05, 2.6344e-07],\n",
+    "                        [7.9597e-02, 1.4642e-05, 2.2314e-07],\n",
+    "                        [8.3577e-02, 1.1589e-05, 1.8938e-07],\n",
+    "                        [8.7756e-02, 9.5418e-06, 1.6220e-07],\n",
+    "                        [9.2143e-02, 7.5694e-06, 1.3809e-07],\n",
+    "                        [9.6751e-02, 6.3831e-06, 1.2097e-07],\n",
+    "                        [1.0159e-01, 5.0708e-06, 1.0333e-07],\n",
+    "                        [1.0667e-01, 4.1041e-06, 8.9548e-08],\n",
+    "                        [1.1200e-01, 3.4253e-06, 7.9830e-08],\n",
+    "                        [1.1760e-01, 2.8116e-06, 7.1554e-08],\n",
+    "                        [1.2348e-01, 2.3767e-06, 6.3738e-08],\n",
+    "                        [1.2966e-01, 1.9241e-06, 5.6586e-08],\n",
+    "                        [1.3614e-01, 1.5642e-06, 5.2778e-08],\n",
+    "                        [1.4294e-01, 1.2922e-06, 4.9730e-08],\n",
+    "                        [1.5009e-01, 1.1694e-06, 5.1175e-08],\n",
+    "                        [1.5760e-01, 9.7837e-07, 5.0755e-08],\n",
+    "                        [1.6548e-01, 8.9138e-07, 5.3542e-08],\n",
+    "                        [1.7375e-01, 7.9420e-07, 5.4857e-08],\n",
+    "                        [1.8244e-01, 7.9131e-07, 5.8067e-08],\n",
+    "                        [1.9156e-01, 6.5358e-07, 5.7717e-08],\n",
+    "                        [2.0114e-01, 6.2970e-07, 5.7951e-08],\n",
+    "                        [2.1119e-01, 5.0130e-07, 5.5262e-08],\n",
+    "                        [2.2175e-01, 5.0218e-07, 5.6461e-08],\n",
+    "                        [2.3284e-01, 3.9299e-07, 5.0685e-08],\n",
+    "                        [2.4448e-01, 3.5324e-07, 5.0194e-08],\n",
+    "                        [2.5671e-01, 4.4475e-07, 5.6485e-08],\n",
+    "                        [2.6954e-01, 5.1338e-07, 6.2247e-08],\n",
+    "                        [2.8302e-01, 3.4918e-07, 4.9745e-08],\n",
+    "                        [2.9717e-01, 4.3037e-07, 5.5488e-08],\n",
+    "                        [3.1203e-01, 4.0099e-07, 5.3591e-08],\n",
+    "                        [3.2763e-01, 3.8397e-07, 5.1303e-08],\n",
+    "                        [3.4401e-01, 3.0995e-07, 4.5965e-08],\n",
+    "                        [3.6121e-01, 3.9357e-07, 5.0135e-08],\n",
+    "                        [3.7927e-01, 3.0997e-07, 4.3680e-08],\n",
+    "                        [3.9824e-01, 2.9656e-07, 4.2432e-08],\n",
+    "                        [4.1815e-01, 2.1909e-07, 3.6117e-08],\n",
+    "                        [4.3906e-01, 2.3153e-07, 3.6307e-08],\n",
+    "                        [4.6101e-01, 3.3428e-07, 4.3874e-08],\n",
+    "                        [4.8406e-01, 2.3441e-07, 3.7488e-08],\n",
+    "                        [5.0826e-01, 1.5496e-07, 3.0585e-08],\n",
+    "                        [5.3368e-01, 2.4708e-07, 3.9376e-08],\n",
+    "                        [5.6036e-01, 2.2157e-07, 3.8258e-08],\n",
+    "                        [5.8838e-01, 2.2798e-07, 4.6976e-08],\n",
+    "                        [6.1169e-01, 6.0272e-07, 2.3239e-07],\n",
+    "                    ]\n",
+    "                ),\n",
+    "                data_range=[0.048866, 0.61169],\n",
+    "                simulation_range=[0.048866, 0.61169],\n",
+    "            ),\n",
+    "        ],\n",
+    "        contrasts=[\n",
+    "            Contrast(\n",
+    "                name=\"Chain-d, acmw\",\n",
+    "                data=\"f82395c\",\n",
+    "                background=\"Background 1\",\n",
+    "                background_action=\"add\",\n",
+    "                bulk_in=\"Air\",\n",
+    "                bulk_out=\"D2O\",\n",
+    "                scalefactor=\"Scalefactor 1\",\n",
+    "                resolution=\"Resolution 1\",\n",
+    "                resample=False,\n",
+    "                model=[],\n",
+    "            )\n",
+    "        ],\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Firstly, we will run calculations using nested sampling and DREAM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ns_controls = RAT.Controls(procedure=\"ns\", nsTolerance=1, nLive=500, display=\"final\")\n",
+    "_, ns_results = RAT.run(project, ns_controls)\n",
+    "\n",
+    "dream_controls = RAT.Controls(procedure=\"dream\", display=\"final\")\n",
+    "_, dream_results = RAT.run(project, dream_controls)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we will perform our direct calculation. The standard `'calculate'` procedure in RAT runs an Abelès calculation for the reflectivity of our model, and calculates the $\\chi^2$ statistic for how well this reflectivity fits the given data. We will take a sample of 30 values between the minimum and maximum value of our roughness and background parameters, and calculate $\\exp(-\\chi^2 / 2)$ on this roughness-background grid. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rough_param = project.parameters[0]\n",
+    "roughness = np.linspace(rough_param.min, rough_param.max, 30)\n",
+    "\n",
+    "back_param = project.background_parameters[0]\n",
+    "background = np.linspace(back_param.min, back_param.max, 30)\n",
+    "\n",
+    "controls = RAT.Controls(procedure=\"calculate\", calcSldDuringFit=True, display=\"off\")\n",
+    "\n",
+    "# function to calculate exp(-chi_squared / 2) for a given pair of roughness/background values\n",
+    "def calculate_posterior(roughness_index: int, background_index: int) -> float:\n",
+    "    \"\"\"Calculate the posterior for an item in the roughness and background vectors.\n",
+    "\n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    roughness_index : int\n",
+    "        The index of the roughness vector to use as the roughness parameter value.\n",
+    "    background_index : int\n",
+    "        The index of the background vector to use as the background parameter value.\n",
+    "\n",
+    "    Returns\n",
+    "    -------\n",
+    "    float\n",
+    "        The value of exp(-chi^2 / 2) for the given roughness and background values.\n",
+    "    \"\"\"\n",
+    "    project.parameters[0].value = roughness[roughness_index]\n",
+    "    project.background_parameters[0].value = background[background_index]\n",
+    "\n",
+    "    _, results = RAT.run(project, controls)\n",
+    "    chi_squared = results.calculationResults.sumChi\n",
+    "\n",
+    "    return np.exp(-chi_squared / 2)\n",
+    "\n",
+    "# we vectorise the calculation to make it faster by running it over a matrix of indices (x, y)\n",
+    "vectorized_calc_posterior = np.vectorize(calculate_posterior)\n",
+    "probability_array = vectorized_calc_posterior(*np.indices((30, 30), dtype=int))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can get the parameter values that best fit our model for each method:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get the vector indices that produced the lowest chi-squared\n",
+    "best_indices = np.unravel_index(np.argmax(probability_array, axis=None), probability_array.shape)\n",
+    "print(\"Best values according to direct calculation:\\n\",\n",
+    "      \"Roughness: \", roughness[best_indices[0]], \"\\n\",\n",
+    "      \"Background: \", background[best_indices[1]])\n",
+    "\n",
+    "print(\"Best values according to Nested Sampler:\\n\",\n",
+    "      \"Roughness: \", ns_results.fitParams[0], \"\\n\",\n",
+    "      ## FIXME: once fitParams outputs properly!\n",
+    ")#      \"Background: \", ns_results.fitParams[1])\n",
+    "\n",
+    "print(\"Best values according to DREAM:\\n\",\n",
+    "      \"Roughness: \", dream_results.fitParams[0], \"\\n\",\n",
+    "      ## FIXME: once fitParams outputs properly!\n",
+    "      )#      \"Background: \", dream_results.fitParams[1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And finally, we will plot the posteriors created via each method to compare, as well as contour plots\n",
+    "for each method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib import colormaps\n",
+    "import RATapi.utils.plotting as RATplot\n",
+    "\n",
+    "fig, axes = plt.subplots(3, 2, figsize=(6, 9))\n",
+    "\n",
+    "# plot NS and DREAM for each parameter\n",
+    "for i in [0, 1]:\n",
+    "    RATplot.plot_one_hist(ns_results, i, axes=axes[0][i])\n",
+    "    RATplot.plot_one_hist(dream_results, i, axes=axes[1][i])\n",
+    "    # we want all 3 plots to have the same x-range\n",
+    "    # so we will use the nested sampler x-range as our base\n",
+    "    axes[1][i].set_xlim(*axes[0][i].get_xlim())\n",
+    "    axes[1][i].set_title(\"\")\n",
+    "\n",
+    "# marginalise the probability array to get distributions for each parameter\n",
+    "roughness_distribution = np.sum(probability_array, axis=1)\n",
+    "background_distribution = np.sum(probability_array, axis=0)\n",
+    "\n",
+    "axes[2][0].hist(\n",
+    "    roughness,\n",
+    "    bins=25,\n",
+    "    range=axes[0][0].get_xlim(),\n",
+    "    weights=roughness_distribution,\n",
+    "    density=True,\n",
+    "    edgecolor=\"black\",\n",
+    "    linewidth=1.2,\n",
+    "    color=\"white\",\n",
+    "    )\n",
+    "\n",
+    "axes[2][1].hist(\n",
+    "    background,\n",
+    "    bins=25,\n",
+    "    range=axes[0][1].get_xlim(),\n",
+    "    weights=background_distribution,\n",
+    "    density=True,\n",
+    "    edgecolor=\"black\",\n",
+    "    linewidth=1.2,\n",
+    "    color=\"white\",\n",
+    "    )\n",
+    "\n",
+    "axes[0][0].set_ylabel(\"nested sampler\")\n",
+    "axes[1][0].set_ylabel(\"DREAM\")\n",
+    "axes[2][0].set_ylabel(\"direct calculation\")\n",
+    "fig.tight_layout()\n",
+    "\n",
+    "fig.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "contour_fig, axes = plt.subplots(1, 3, figsize=(9, 3))\n",
+    "\n",
+    "# plot NS and DREAM for each parameter\n",
+    "RATplot.plot_contour(ns_results, 0, 1, axes=axes[0])\n",
+    "RATplot.plot_contour(dream_results, 0, 1, axes=axes[1])\n",
+    "\n",
+    "axes[2].pcolormesh(roughness, background, probability_array.max() - probability_array.T, cmap=colormaps[\"Greys\"].reversed())\n",
+    "axes[2].contour(roughness, background, probability_array.max() - probability_array.T, colors=\"black\")\n",
+    "\n",
+    "axes[1].set_xlim(*axes[0].get_xlim())\n",
+    "axes[1].set_ylim(*axes[0].get_ylim())\n",
+    "axes[2].set_xlim(*axes[0].get_xlim())\n",
+    "axes[2].set_ylim(*axes[0].get_ylim())\n",
+    "\n",
+    "axes[0].set_title(\"NS\")\n",
+    "axes[1].set_title(\"DREAM\")\n",
+    "axes[2].set_title(\"direct\")\n",
+    "axes[1].set_ylabel(\"\")\n",
+    "axes[2].set_ylabel(\"\")\n",
+    "axes[2].set_xlabel(\"Substrate Roughness\")\n",
+    "fig.tight_layout()\n",
+    "fig.show()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}