Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions doc/docs/Backend_Hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
# Backend Hooks
---

`<meep/backend_hooks.hpp>` declares a small extension point that lets an external library plug into meep's hot paths at load time, without forking or patching upstream sources. The intended use is sibling backends (CUDA, ROCm, vectorized-CPU) that ship as separate shared libraries pinned to a specific meep version.

This document describes the contract. The header itself is the canonical source.

## Shape

A single process-global table of function pointers:

```cpp
namespace meep {
struct backend_hooks {
void (*init) (fields *f);
void (*cleanup) (fields *f);
bool (*step) (fields *f);
void (*sync_to_host) (fields *f);
void (*sync_from_host) (fields *f);
realnum (*read_point) (const fields *f, const fields_chunk *fc,
component c, int cmp, ptrdiff_t idx);
bool (*needs_host_sync)(const fields *f);
};
extern backend_hooks meep_backend;
}
```

All entries default to null. **Null means "fall through to the in-tree CPU implementation."** Every call site in upstream meep is a null-pointer test, so a meep build with no backend loaded behaves bit-identically to a build without these hooks at all.

A backend is installed by writing function pointers into `meep::meep_backend` at library load time (typically from a `__attribute__((constructor))`).

## Per-sim opaque state

Backends store per-simulation state in `fields::backend_state` (a `void *`) and per-chunk state in `fields_chunk::backend_state`. Upstream meep never inspects these slots; backends are responsible for allocating them in `init` and freeing them in `cleanup`.

## Lifecycle

```
fields::fields(...) # constructs structure + chunks + connections
└─ meep_backend.init(this) # backend allocates per-sim/chunk state
# backend stashes pointers in backend_state slots
... user code calls f.step(), f.flux_in_box(), f.add_dft(), etc. ...
fields::~fields()
└─ meep_backend.cleanup(this) # called BEFORE chunks are deleted, so the
# backend can read fields_chunk::backend_state
└─ chunk teardown
```

`init` is also called from the `fields` copy constructor, so a backend that supports `fields(const fields&)` must be prepared to attach to a freshly-copied object.

## Hook contracts

### `step`

```cpp
bool step(fields *f);
```

Take one full FDTD timestep on behalf of the caller. Returning `true` means the backend handled the step and the in-tree CPU step path is skipped. Returning `false` (or leaving the hook null) falls through to the CPU step path -- useful for backends that handle most configurations but not all.

The step hook is **bypassed entirely** when `fields::backend_suspended == true`. The CW solver sets this flag for the duration of its CG iterations, since it operates against the host arrays directly.

### `sync_to_host` / `sync_from_host`

```cpp
void sync_to_host(fields *f);
void sync_from_host(fields *f);
```

Sync the canonical host arrays (`fc->f[c][cmp]`) with the backend's shadow storage.

- `sync_to_host`: bring the latest field values into the host arrays so CPU code can read them. Called at every CPU readout site (DFT, monitors, integration, dump, CW solver).
- `sync_from_host`: push host-side modifications back into shadow storage. Called after `fields::load`, after `fields::solve_cw`.

Backends may treat either as a no-op when not active. Both hooks are also called via the convenience helper `sync_host_if_needed(f)`, which checks `needs_host_sync` first.

### `needs_host_sync`

```cpp
bool needs_host_sync(const fields *f);
```

Returns `true` iff the host arrays do not reflect the backend's current state and `sync_to_host` should be called before reading them. Returns `false` when no backend is active or when the backend's host arrays are already in sync.

### `read_point`

```cpp
realnum read_point(const fields *f, const fields_chunk *fc,
component c, int cmp, ptrdiff_t idx);
```

Fast single-cell read. Used on the LDOS / point-monitor hot path to fetch one field value without triggering a full `sync_to_host`. Backends that don't implement this should leave it null; callers fall back to a sync followed by a direct array read.

### `init` / `cleanup`

```cpp
void init(fields *f);
void cleanup(fields *f);
```

Per-sim setup and teardown. Typical body: allocate device buffers and per-chunk shadow storage, stash pointers in `f->backend_state` and each `f->chunks[i]->backend_state`. The matching `cleanup` releases everything.

## Minimal example

A "transparent" backend that counts hook invocations and defers all work to the CPU. Suitable as a starting skeleton.

```cpp
#include <meep.hpp>
#include <meep/backend_hooks.hpp>

namespace my_backend {

static void on_init(meep::fields *) { /* allocate device state, stash in f->backend_state */ }
static void on_cleanup(meep::fields *) { /* free device state */ }

static bool on_step(meep::fields *) {
// run the FDTD step on the device
// return true to skip the CPU path; return false to defer to CPU
return false;
}

static void on_sync_to_host(meep::fields *) { /* device -> host arrays */ }
static void on_sync_from_host(meep::fields *) { /* host arrays -> device */ }
static bool on_needs_host_sync(const meep::fields *) { return false; }

__attribute__((constructor))
static void install() {
meep::meep_backend.init = on_init;
meep::meep_backend.cleanup = on_cleanup;
meep::meep_backend.step = on_step;
meep::meep_backend.sync_to_host = on_sync_to_host;
meep::meep_backend.sync_from_host = on_sync_from_host;
meep::meep_backend.needs_host_sync = on_needs_host_sync;
}

} // namespace my_backend
```

Build it as a shared library, then load it before running meep:

```sh
LD_PRELOAD=libmy_backend.so python -c "import meep; ..."
```

See `tests/backend_hooks.cpp` in the meep source for a complete working example used as a CI guard.

## MPI

The hook table is process-global. Each MPI rank has its own `meep_backend` and calls hooks against its own `fields` object. The backend is responsible for any cross-rank coordination (NCCL, MPI, `sum_to_all`, etc.) — meep never synchronizes hook invocations across ranks.

A few implicit contracts that backends running under MPI must respect:

1. **`step` return value must be collective.** Every rank's `meep_backend.step(this)` must return the same `bool`. If one rank returns `true` (skip CPU path) and another returns `false`, the latter will execute `step_boundaries` -- which calls `MPI_Sendrecv` -- while the former will not, deadlocking the run. A backend that handles some configurations but not others must agree on that decision across ranks before returning, or just always return `true` once installed.

2. **`backend_suspended` must be set/cleared collectively.** Same reason: if some ranks are suspended and others aren't, the next `step()` call diverges. In practice this is fine for `solve_cw` because the CW solver itself is collective.

3. **`init` and `cleanup` run in collective contexts.** `fields::fields()` and `fields::~fields()` are collective in meep, so any MPI/NCCL collective the backend wants to do during setup or teardown (e.g., `MPI_Comm_split`, `ncclCommInitAll`) is safe to perform there.

4. **`read_point` is local-only.** The in-tree call sites only invoke it when `chunks[i]->is_mine()` is true, so the backend only has to serve points it owns. Cross-rank queries fall back to the existing `chunks[i]->get_field()` path, which returns 0 on remote ranks and gets reduced via `sum_to_all` outside the loop.

`sync_to_host` and `sync_from_host` are per-rank: each rank syncs its own chunks. They don't need to be collective unless the backend specifically wants them to be.

## ABI notes

There is no formal ABI versioning on the hook table. Backends should be pinned to a specific meep version (typically by submodule or distro package). Adding a new function pointer at the end of `backend_hooks` is forward-compatible with already-built backends because the global is zero-initialized; reordering or removing fields breaks ABI.
5 changes: 3 additions & 2 deletions src/Makefile.am
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
lib_LTLIBRARIES = libmeep.la
include_HEADERS = meep.hpp
pkginclude_HEADERS = meep/mympi.hpp meep/vec.hpp meep/meep-config.h meepgeom.hpp material_data.hpp adjust_verbosity.hpp
pkginclude_HEADERS = meep/mympi.hpp meep/vec.hpp meep/meep-config.h meep/backend_hooks.hpp meepgeom.hpp material_data.hpp adjust_verbosity.hpp

AM_CPPFLAGS = -I$(top_srcdir)/src

BUILT_SOURCES = sphere-quad.h step_generic_stride1.cpp meep/meep-config.h

HDRS = meep.hpp meep_internals.hpp meep/mympi.hpp meep/vec.hpp \
HDRS = meep.hpp meep_internals.hpp meep/mympi.hpp meep/vec.hpp meep/backend_hooks.hpp \
bicgstab.hpp meepgeom.hpp material_data.hpp adjust_verbosity.hpp

libmeep_la_SOURCES = array_slice.cpp anisotropic_averaging.cpp \
backend_hooks.cpp \
bands.cpp boundaries.cpp bicgstab.cpp casimir.cpp \
cw_fields.cpp dft.cpp dft_ldos.cpp energy_and_flux.cpp \
fields.cpp fields_dump.cpp fix_boundary_sources.cpp loop_in_chunks.cpp h5fields.cpp h5file.cpp \
Expand Down
23 changes: 23 additions & 0 deletions src/backend_hooks.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/* Copyright (C) 2005-2026 Massachusetts Institute of Technology
%
% This program is free software; you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation; either version 2, or (at your option)
% any later version.
%
% This program is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
% GNU General Public License for more details.
*/

#include "meep/backend_hooks.hpp"

namespace meep {

/* Process-global table. Zero-initialized: every hook starts as a null
* function pointer, which the call sites read as "no backend installed,
* fall through to the CPU path". */
backend_hooks meep_backend = {};

} /* namespace meep */
15 changes: 15 additions & 0 deletions src/cw_fields.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
*/

#include "meep_internals.hpp"
#include "meep/backend_hooks.hpp"
#include "bicgstab.hpp"

using namespace std;
Expand Down Expand Up @@ -146,6 +147,14 @@ bool fields::solve_cw(double tol, int maxiters, complex<double> frequency, int L
int tsave = t; // save time (gets incremented by iterations)
int iters;

/* The CW solver runs CG iterations against the host-side `f[c][cmp]`
* arrays directly. If a backend is steering this sim, pull its shadow
* state back to host now and suspend the step hook so `step()` calls
* inside the CG loop run on the host. We unsuspend and push the
* converged solution back to the backend at the end. */
sync_host_if_needed(this);
backend_suspended = true;

set_solve_cw_omega(2 * pi * frequency);

step(); // step once to make sure everything is allocated
Expand Down Expand Up @@ -248,6 +257,12 @@ bool fields::solve_cw(double tol, int maxiters, complex<double> frequency, int L
unset_solve_cw_omega();
update_dfts();

/* Unsuspend and push the converged host-side solution back into the
* backend's shadow storage before normal time-stepping resumes.
* No-op without a backend. */
backend_suspended = false;
if (meep_backend.sync_from_host) meep_backend.sync_from_host(this);

return !ierr;
}

Expand Down
4 changes: 4 additions & 0 deletions src/dft.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <algorithm>
#include <assert.h>
#include "meep.hpp"
#include "meep/backend_hooks.hpp"
#include "meep_internals.hpp"

using namespace std;
Expand Down Expand Up @@ -312,6 +313,9 @@ void dft_chunk::update_dft(double time) {
(Collective operation.) */
double fields::dft_norm() {
am_now_working_on(Other);
/* Backends keep DFT accumulators on the device too — pull them back
* before reading. No-op when no backend is loaded. */
sync_host_if_needed(this);
double sum = 0.0;
for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine()) sum += chunks[i]->dft_norm2(gv);
Expand Down
30 changes: 20 additions & 10 deletions src/dft_ldos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
*/

#include "meep.hpp"
#include "meep/backend_hooks.hpp"
#include "meep_internals.hpp"

using namespace std;
Expand Down Expand Up @@ -104,44 +105,53 @@ void dft_ldos::update(fields &f) {
// ...don't worry about the tiny inefficiency of recomputing this repeatedly
Jsum = 0.0;

/* If no fast point-read is available, fall back to a single up-front
* sync so the direct array reads (via read_field_at) see fresh data. */
if (!meep_backend.read_point) sync_host_if_needed(&f);

for (int ic = 0; ic < f.num_chunks; ic++)
if (f.chunks[ic]->is_mine()) {
for (const src_vol &sv : f.chunks[ic]->get_sources(D_stuff)) {
fields_chunk *fc = f.chunks[ic];
for (const src_vol &sv : fc->get_sources(D_stuff)) {
component c = direction_component(Ex, component_direction(sv.c));
realnum *fr = f.chunks[ic]->f[c][0];
realnum *fi = f.chunks[ic]->f[c][1];
realnum *fr = fc->f[c][0];
realnum *fi = fc->f[c][1];
if (fr && fi) // complex E
for (size_t j = 0; j < sv.num_points(); j++) {
const ptrdiff_t idx = sv.index_at(j);
const complex<double> &A = sv.amplitude_at(j);
EJ += complex<double>(fr[idx], fi[idx]) * conj(A);
EJ += complex<double>(read_field_at(&f, fc, c, 0, idx),
read_field_at(&f, fc, c, 1, idx)) *
conj(A);
Jsum += abs(A);
}
else if (fr) { // E is purely real
for (size_t j = 0; j < sv.num_points(); j++) {
const ptrdiff_t idx = sv.index_at(j);
const complex<double> &A = sv.amplitude_at(j);
EJ += double(fr[idx]) * conj(A);
EJ += double(read_field_at(&f, fc, c, 0, idx)) * conj(A);
Jsum += abs(A);
}
}
}
for (const src_vol &sv : f.chunks[ic]->get_sources(B_stuff)) {
for (const src_vol &sv : fc->get_sources(B_stuff)) {
component c = direction_component(Hx, component_direction(sv.c));
realnum *fr = f.chunks[ic]->f[c][0];
realnum *fi = f.chunks[ic]->f[c][1];
realnum *fr = fc->f[c][0];
realnum *fi = fc->f[c][1];
if (fr && fi) // complex H
for (size_t j = 0; j < sv.num_points(); j++) {
const ptrdiff_t idx = sv.index_at(j);
const complex<double> &A = sv.amplitude_at(j);
HJ += complex<double>(fr[idx], fi[idx]) * conj(A);
HJ += complex<double>(read_field_at(&f, fc, c, 0, idx),
read_field_at(&f, fc, c, 1, idx)) *
conj(A);
Jsum += abs(A);
}
else if (fr) { // H is purely real
for (size_t j = 0; j < sv.num_points(); j++) {
const ptrdiff_t idx = sv.index_at(j);
const complex<double> &A = sv.amplitude_at(j);
HJ += double(fr[idx]) * conj(A);
HJ += double(read_field_at(&f, fc, c, 0, idx)) * conj(A);
Jsum += abs(A);
}
}
Expand Down
10 changes: 10 additions & 0 deletions src/fields.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <complex>

#include "meep.hpp"
#include "meep/backend_hooks.hpp"
#include "meep_internals.hpp"

using namespace std;
Expand Down Expand Up @@ -83,6 +84,9 @@ fields::fields(structure *s, double m, double beta, bool zero_fields_near_cylori
s->user_volume.num_direction(d) == 1)
use_bloch(d, 0.0);
}

// Notify any installed backend that this fields object is ready.
if (meep_backend.init) meep_backend.init(this);
}

fields::fields(const fields &thef)
Expand Down Expand Up @@ -125,9 +129,15 @@ fields::fields(const fields &thef)
FOR_DIRECTIONS(d) { boundaries[b][d] = thef.boundaries[b][d]; }
chunk_connections_valid = false;
changed_materials = true;

// Notify any installed backend about the copied-in fields.
if (meep_backend.init) meep_backend.init(this);
}

fields::~fields() {
// Let the backend release per-sim and per-chunk state while the chunks
// are still alive (the backend may want to read fields_chunk::backend_state).
if (meep_backend.cleanup) meep_backend.cleanup(this);
for (int i = 0; i < num_chunks; i++)
delete chunks[i];
delete[] chunks;
Expand Down
9 changes: 9 additions & 0 deletions src/fields_dump.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <cassert>

#include "meep.hpp"
#include "meep/backend_hooks.hpp"
#include "meep_internals.hpp"

namespace meep {
Expand Down Expand Up @@ -110,6 +111,10 @@ void fields::dump(const char *filename, bool single_parallel_file) {
printf("creating fields output file \"%s\" (%d)...\n", filename, single_parallel_file);
}

/* Make sure host arrays match the backend's shadow state before we
* serialize them. No-op when no backend is loaded. */
sync_host_if_needed(this);

h5file file(filename, h5file::WRITE, single_parallel_file, !single_parallel_file);

// Write out the current time 't'
Expand Down Expand Up @@ -275,6 +280,10 @@ void fields::load(const char *filename, bool single_parallel_file) {
load_dft_hdf5(chunks[i]->dft_chunks, dataname, &file, 0, single_parallel_file);
}
}

/* Push the freshly-loaded host arrays back into the backend's shadow
* storage so the next step sees them. No-op when no backend is loaded. */
if (meep_backend.sync_from_host) meep_backend.sync_from_host(this);
}

} // namespace meep
Loading
Loading