Skip to content

Updates to increase the flexibility of the influence calculator#4

Closed
alexanderbates wants to merge 1 commit intoDrugowitschLab:mainfrom
alexanderbates:asb
Closed

Updates to increase the flexibility of the influence calculator#4
alexanderbates wants to merge 1 commit intoDrugowitschLab:mainfrom
alexanderbates:asb

Conversation

@alexanderbates
Copy link
Copy Markdown

These updates attempt to solve a suite of medium-level issues. This will help improve the repo's usability, I feel!

  • Enabling the calculator to take more diverse input, not just .sqlite files (inc. csv, feather, parquet, pandas data frames)
  • Providing a bundled function to "adjust" influence score with a constant and by taking the log, as we describe in our paper
  • Surfacing lambda as a value the user can alter, which turns out to be necessary to analyse e.g. the c elegans connectome
  • not hard-coding NT assignment, we should leave this up to the user
  • Correlated documentation for these changes, inc. in the README and inc. images that are helpful
  • Package with C. elegans data to users can quickly get started running examples with real connectome data

Here is the detail on these changes:

ConnectomeInfluenceCalculator — Update Notes

A summary of the conceptual changes made on the working tree (no commits yet). Each entry gives the change, the reason for it, and the files touched.


1. DataFrame / CSV / Parquet / Feather / NumPy constructors

Change. The library now accepts pandas DataFrames and several common on-disk formats in addition to the original SQLite path:

  • InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)
  • InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)
  • InfluenceCalculator.from_parquet(...), from_feather(...)
  • InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)

The old InfluenceCalculator(filename, ...) SQLite path still works.

Why. The original API forced every caller to package their connectome into a SQLite file with a specific schema, which is awkward for ad-hoc exploration, for users coming from R / pandas pipelines, and for the worked example in this repo. The DataFrame constructor accepts the same columns as the SQLite schema (pre, post, count, optional norm) plus a metadata frame with root_id and (when relevant) top_nt.

Files. InfluenceCalculator/InfluenceCalculator.py — added from_* classmethods plus two module-level helpers, _validate_meta and _validate_and_prepare_edgelist, that enforce the column requirements with descriptive error messages.


2. Bundled C. elegans dataset

Change. A small C. elegans connectome (300 neurons, 3,539 chemical edges, 20,672 synapses) ships with the package and is exposed as:

from InfluenceCalculator.data import celegans_edgelist, celegans_meta
edges = celegans_edgelist()   # pre, post, count, norm
meta  = celegans_meta()       # root_id, top_nt, super_class, neuron_class, body_part

Why. Tests and examples need a real connectome they can build a calculator from without external downloads. The previous toy tests/toy_network_example.sqlite was opaque and not documented; the C. elegans graph is small, public, and well-annotated, which lets the worked example double as a tutorial against a famous connectome.

Files. InfluenceCalculator/data/__init__.py (new — uses importlib.resources.files() so it works whether installed or run in-tree); InfluenceCalculator/data/celegans_edgelist.csv and InfluenceCalculator/data/celegans_meta.csv (new); pyproject.toml gains a [tool.setuptools.package-data] entry so the CSVs are included in the wheel.

Provenance. The data was taken from the
OpenWorm project distribution of the
C. elegans hermaphrodite chemical connectome (accessed
February 2026), which aggregates the original electron-microscopy
reconstructions of White et al. 1986 and Cook et al. 2019. The
README "Data source" section carries the full citation block;
downstream users redistributing the bundled CSVs should cite both
primary sources and the OpenWorm aggregation.


3. Module-level adjust_influence

Change. A new function adjust_influence(df, const=24, signif=6) is exported alongside InfluenceCalculator. It takes the DataFrame returned by calculate_influence, groups by (target, seed), sums within each group, and returns three columns:

  • adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)
  • adjusted_influence_norm_by_targets
  • adjusted_influence_norm_by_sources_and_targets

Why. Raw influence scores span many orders of magnitude — the strongest direct paths can be ten billion times larger than the weakest distal trickle — so a linear-scaled heatmap shows the top of the distribution and nothing else. adjust_influence does three things to make the output legible:

  • log(...) compresses the dynamic range so weak and strong paths are visible side by side.
  • max(|x|, exp(-const)) is a junk-node floor: anything weaker than exp(-const) is treated as "essentially zero", which keeps log(0) from producing -inf and stops the colormap getting hijacked by numerical noise.
  • + const then shifts everything so the smallest meaningful score sits at exactly 0 and a colormap can be anchored there without losing sign information.

Mirrors adjust_influence in the R sibling package natverse/influencer.

Files. InfluenceCalculator/InfluenceCalculator.py (function defined at module scope); InfluenceCalculator/__init__.py (exported in __all__); tests/test_influence_calculator.py (test_adjust_influence_basics, test_adjust_influence_threshold_floor, test_adjust_influence_preserves_sign).


4. Sign-preserving signed mode

Change. When signed=True is set, the influence DataFrame now returns the real part of the steady-state vector rather than its magnitude — so a target dominated by inhibitory paths gets a negative Influence_score_(signed). adjust_influence propagates the sign through the log transform.

Why. Without sign preservation, "signed" mode silently degenerated to unsigned: every output was a non-negative magnitude, and the only remaining effect of GABA-pre-neuron negation was to reduce magnitudes where positive and negative paths cancelled. That cancellation is real but undetectable from the output, which makes the signed flag indistinguishable from a perturbed unsigned run.

While auditing this we also found a pre-existing bug in _create_sparse_W. To make signed=True produce negative weights for inhibitory pre-neurons, the original code multiplied the relevant rows of the edgelist's count column by -1. But the matrix W is populated from the norm column (the fraction count / sum_per_post), not from count. Flipping the sign of count left norm positive, so every entry of W ended up positive regardless of how signed= was set — the signed setting silently built the same matrix as the unsigned setting. The fix multiplies whichever column is actually used to build W (held in the syn_weight_measure variable, which defaults to norm), so the negation now reaches the matrix.

Files. InfluenceCalculator/InfluenceCalculator.py_build_influence_dataframe now branches on self.W_signed; _create_sparse_W negates syn_weight_measure instead of 'count'.


5. Externalised neurotransmitter assignment (inhibitory_nts, excluded_nts)

Change. The old hardcoded NEG_NEUROTRANSMITTERS = {'glutamate', 'gaba', 'serotonin', 'octopamine'} constant has been removed. Two keyword arguments now pass through the constructor and every classmethod:

  • inhibitory_nts={'gaba', ...}top_nt values whose pre-neurons receive a negative sign in signed=True mode.
  • excluded_nts={'dopamine', 'serotonin', ...}top_nt values whose pre-neurons contribute zero outgoing weight (their columns of W are empty), in either signed or unsigned mode.

Why.

  1. Species-dependence. Glutamate is net excitatory in mammals and most Drosophila circuits but is dual-action in C. elegans (GluCl chloride channels make it inhibitory at e.g. AWC→AIY). Hardcoding the sign in the library forces every user into the same biological assumption.
  2. Receptor-mix uncertainty. Modulators like dopamine, serotonin, and octopamine can be excitatory or inhibitory at a given target depending on the receptor mix. excluded_nts lets users silence pre-neurons whose net effect cannot be assigned a single sign, rather than forcing a wrong one.
  3. Library hygiene. Per a direct user instruction: "the library should not pre-empt how the user wishes to assign transmitters." Sets are still demonstrated in the README and worked example as defaults users can copy-paste.

Files. InfluenceCalculator/InfluenceCalculator.py (constants removed; new kwargs threaded through __init__ and all from_* classmethods; validation raises ValueError if signed=True is set without inhibitory_nts or if excluded_nts is used without a meta_df containing top_nt); tests/test_influence_calculator.py gains test_excluded_nts_removes_edges and
test_excluded_nts_requires_top_nt.


6. Exposed lambda_max as a documented parameter

Change. The target largest real eigenvalue of the rescaled connectivity matrix W̃, previously hardcoded to 0.99 inside _normalize_W, is now a constructor argument:

ic = InfluenceCalculator.from_dataframes(edges, meta, lambda_max=0.5)

Default remains 0.99 for backwards compatibility.

_normalize_W now always rescales to lambda_max exactly (rather than only capping when the natural eigenvalue exceeds it), so the parameter is a true control knob over leading-mode amplification rather than a stability ceiling.

Why. Think of lambda_max as a reverb knob on the network. Near 1, a signal injected at the seed echoes around the graph many times before fading — the gain along the dominant recurrent loop is 1/(1-lambda_max), so 100× at 0.99 versus at 0.5. Crank it to the max and the dominant loop drowns out finer differences between targets: every column of the heatmap ends up with nearly the same shape. Turn it down and the signal mostly travels along short paths, exposing per-target specificity at the cost of attenuating long polysynaptic effects.

Which value is "right" depends on the connectome. The default 0.99 is calibrated for a whole-CNS Drosophila graph (BANC-scale, ~130k neurons), where you want maximum sensitivity to weak distal influence. On a small graph like the C. elegans connectome the same setting puts the leading mode in charge of the entire heatmap; 0.5 is a more useful starting point. The point of exposing the parameter is that this is a knob users should be turning, not a hidden constant.

Trade-off shown by the worked example sweep (28 canonical sensory→interneuron pairs from the C. elegans literature, ranked within each seed's column; mean column std as a leading-mode dominance proxy):

| lambda_max | canonical mean rank-frac | mean col std (info) | |--------------|--------------------------|---------------------| | 0.10 | 0.931 | 0.160 |
| 0.30 | 0.929 | 0.162 |
| 0.50 | 0.932 | 0.149 |
| 0.70 | 0.933 | 0.131 |
| 0.90 | 0.937 | 0.105 |
| 0.99 | 0.921 | 0.042 |

Canonical-pair scores barely move (the strongest direct paths win at any λ), but column differentiation collapses by ~4× between λ=0.90 and λ=0.99 — the leading-mode dominance signature. The worked example defaults to lambda_max=0.5 as the balance: canonical hits intact, columns clearly differentiated, some polysynaptic integration retained.

Files. InfluenceCalculator/InfluenceCalculator.py (parameter added and validated on every constructor; _normalize_W reads self.lambda_max); examples/celegans_worked_example.py (defines LAMBDA_MAX = 0.5 and surfaces it in the heatmap title).


7. Worked example — examples/celegans_worked_example.py

Change. A self-contained script that loads the bundled C. elegans graph, computes per-seed influence from every sensory neuron (83 seeds, summed into 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 → 136 cell classes), log-adjusts the per-(target_class, seed_class) raw scores via adjust_influence, and renders two heatmaps in docs/images/:

  • influence_heatmap_unsigned.png (sequential greyscale, [0, max])
  • influence_heatmap_signed.png (diverging blue→white→red, [−bound, +bound])

The seed and target axes are grouped by body_part (body / head / tail; the pharyngeal nervous system is excluded as it is essentially isolated from the rest of the graph) and clustered within each group by average-linkage hierarchical clustering. The matrix is transposed so seed classes index the rows.

Bilateral pairs are summed into cell classes (AVAL/AVAR → AVA, AVDL/AVDR → AVD, IL2DL/IL2DR → IL2D) on both axes via a regex that strips the trailing L/R only when it follows a capital letter (deliberately not including DL|DR|VL|VR as alternatives — Python's leftmost-first alternation would otherwise turn AVDL into AV instead of AVD). The matrix shows the raw adjusted_influence values directly with no per-row min-max rescaling; with lambda_max = 0.5 the leading mode is damped enough that per-target seed specificity is already legible, so a min-max normalisation step is unnecessary.

Why. A connectome library without a worked example is hard to evaluate. The C. elegans example is small enough to run in seconds, recognised enough that the resulting heatmap can be eyeballed against the literature (sensory → command-interneuron paths, body-touch → ventral cord motor blocks, phasmid → AVA/AVD), and structured enough to demonstrate the full library API end-to-end.

const auto-calibration. Rather than hardcoding const=24, the example computes const = -log(min_nonzero |raw|) over the per-row influence scores so the smallest non-zero magnitude maps exactly to 0 after the log transform, eliminating an arbitrary floor and adapting cleanly to different lambda_max choices.

Files. examples/celegans_worked_example.py (new); docs/images/influence_heatmap_unsigned.png and
docs/images/influence_heatmap_signed.png (regenerated each run).


8. Tests overhaul

Change. tests/test_influence_calculator.py is rewritten as discrete pytest functions, fed by tests/conftest.py fixtures that use importlib.resources.as_file() to expose the bundled CSVs as filesystem paths. New tests:

  • test_format_equivalencefrom_dataframes, from_csv, and from_numpy agree on neuron count, matrix size, and ID universe.
  • test_adjust_influence_basics / _threshold_floor / _preserves_sign — covers the new module-level transform.
  • test_input_validation_missing_columns / _signed_no_top_nt.
  • test_excluded_nts_removes_edges / _requires_top_nt.
  • test_norm_auto_computation'norm' is computed when absent.
  • test_round_trip_smoke — full pipeline from CSV → calculate_influenceadjust_influence, gated on PETSc/SLEPc availability.

PETSc/SLEPc-dependent tests use pytest.importorskip so the suite is runnable in environments without those libraries.

Files. tests/conftest.py (new); tests/test_influence_calculator.py (rewritten); pyproject.toml ([tool.pytest.ini_options] plus a test extra).


9. pyproject.toml modernisation

Change. Bumped setuptools >= 77 (so the SPDX-string license = "BSD-3-Clause" syntax from PEP 639 is accepted), set requires-python >= 3.10, version 0.2.0, declared optional extras (parquet, test, examples, dev), added a package-data block for the bundled CSVs, and a [tool.pytest.ini_options] section pointing to tests/.

Why. Pre-existing setup couldn't install on a machine with newer setuptools that emit warnings about the dual-purpose license field; the new form is the documented PEP 639 spelling. The optional extras mean a CI image can install just what it needs (pip install .[test]) rather than every parquet dependency.

Files. pyproject.toml.


10. README — restructured around the new knobs

Change. The README is reorganised so the three things a user actually tunes — inhibitory_nts / excluded_nts, lambda_max, and const for adjust_influence — each have one canonical home:

  • The Description section now derives W̃ = (λ / λ_max(W)) · W with λ as a tuneable target (the lambda_max argument), and retains the explicit gloss "where λ_max(W) is the largest real eigenvalue of W, and λ is the desired largest real eigenvalue of W̃" matching the original phrasing. It carries both the technical explanation (gain = 1/(1-lambda_max) along the leading recurrent mode) and a short "reverb knob" metaphor: high lambda_max makes the network echo signals through long indirect paths, low lambda_max keeps the signal local. Includes inline species guidance — 0.99 (seems appropriate for the whole-CNS Drosophila BANC connectome and larger graphs), near 0.5 (more appropriate for the C. elegans connectome, where the graph is small enough that the leading mode otherwise washes the heatmap out).

  • The "How W is filled" sentence in the Description was rewritten to make the input-normalisation explicit. The original said the matrix is "filled with the number of synaptic connections that a presynaptic neuron projects onto a postsynaptic neuron", which described syn_weight_measure='count' rather than the actual default ('norm'). The new wording makes clear that each entry is the fraction of a postsynaptic neuron's total drive that comes from a given upstream partner and explains the biological rationale (per-edge weights need to be comparable across neurons that vary widely in size and total input count).

  • A new "adjust_influence: log-compression and grouping" section explains why the function exists (raw scores span ten orders of magnitude), the const floor as a junk-node cutoff, and the difference between the three output columns (adjusted_influence vs the two normalised variants) — borrowing framing from the R sibling package's documentation. Includes the adjusted_influence_vs_traversal.jpg figure with an expanded caption that defines the x-axis (graph-traversal depth = mean number of synaptic hops in shortest-path BFS) and reads off the intuition: each polysynaptic step costs ≈ 1.3 units of adjusted_influence, so the score maps directly onto effective polysynaptic distance.

  • A new "Worked example: C. elegans connectome" section embeds the two regenerated heatmaps and shows a minimum-viable end-to-end snippet. A short knobs table cross-references the Description and adjust_influence sections rather than re-explaining each parameter. Detailed biology (the Drosophila vs C. elegans NT-set comparison, the cholinergic-fraction callout that explains the wide blank band on the signed heatmap's seed axis) lives as comments in examples/celegans_worked_example.py rather than in the README.

  • A short "Data source" subsection attributes the bundled CSVs to the OpenWorm project distribution (accessed February 2026) with prose citations to White et al. 1986 and Cook et al. 2019. Full BibTeX lives in the docstring of InfluenceCalculator/data/__init__.py (so help(.data) surfaces it) rather than cluttering the README.

  • The BANC Dataset section now lists, alongside the existing Dataverse DOI, the lab's public Google Cloud Storage path for the Feather-formatted edge list (gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather), and notes that it loads directly through from_feather.

  • The Usage section now also lists the alternative constructors (from_dataframes, from_csv, from_parquet, from_feather, from_numpy) alongside the original SQLite path, names the required edgelist columns (pre, post, count or weight, optional norm) and metadata columns (root_id, plus top_nt when signed=True or excluded_nts is set), and explicitly states that missing columns raise a ValueError that names the required columns and lists the columns the user actually passed — fail-fast with an actionable message rather than a silent bad result.

  • A one-line cross-link to natverse/influencer appears at the top of the Description section.

  • Four images are embedded inline:

    | image | location | role | |---|---|---| | seed_to_targets_diagram.jpg | top of Description | conceptual schematic of source → targets propagation | | linear_dynamical_model.png | next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) | | neural_network_dynamics.gif | after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source .mp4 via two-pass palette ffmpeg, source deleted) | | adjusted_influence_vs_traversal.jpg | in the adjust_influence section | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) |

    The seed_to_targets and adjusted_influence_vs_traversal images are pulled from the R sibling package natverse/influencer; linear_dynamical_model.png and neural_network_dynamics.gif are bespoke for this repo.

Files. README.md;
InfluenceCalculator/data/__init__.py (BibTeX moved into module docstring); examples/celegans_worked_example.py (Drosophila / C. elegans NT comparison + cholinergic-fraction comment absorbed); docs/images/seed_to_targets_diagram.jpg,
docs/images/linear_dynamical_model.png,
docs/images/neural_network_dynamics.gif,
docs/images/adjusted_influence_vs_traversal.jpg (new).


11. .gitignore

Change. Added __pycache__/, .pytest_cache/, .venv/, Influence/ (test-output directory), and CLAUDE.md (working notes, not for distribution). .DS_Store was already there.

Files. .gitignore.


Files affected — index

File Status Summary
InfluenceCalculator/InfluenceCalculator.py modified constructors, adjust_influence, sign preservation, lambda_max, NT externalisation, signed-mode bug fix

These updates attempt to solve a suite of medium-level issues:

- Enabling the calculator to take more diverse input, not just .sqlite files
- Providing a bundled function to "adjust" in influence score as we describe in our paper
- Surfacing lambda as a value the user can alter, which turns out to be necessary to analyse e.g. the c elegans connectome
- not hard-coding NT assignment, we should leave this up to the user
- Correlated documentation for these changes, inc. in the README
- Package with C. elegans data to users can quickly get started running examples with real connectome data

Here is the detail on these changes:

# ConnectomeInfluenceCalculator — Update Notes

A summary of the conceptual changes made on the working tree (no commits yet).
Each entry gives the change, the reason for it, and the files touched.

---

## 1. DataFrame / CSV / Parquet / Feather / NumPy constructors

**Change.** The library now accepts pandas DataFrames and several common
on-disk formats in addition to the original SQLite path:

- `InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)`
- `InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)`
- `InfluenceCalculator.from_parquet(...)`, `from_feather(...)`
- `InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)`

The old `InfluenceCalculator(filename, ...)` SQLite path still works.

**Why.** The original API forced every caller to package their connectome
into a SQLite file with a specific schema, which is awkward for ad-hoc
exploration, for users coming from R / pandas pipelines, and for the
worked example in this repo. The DataFrame constructor accepts the same
columns as the SQLite schema (`pre`, `post`, `count`, optional `norm`)
plus a metadata frame with `root_id` and (when relevant) `top_nt`.

**Files.** `InfluenceCalculator/InfluenceCalculator.py` — added
`from_*` classmethods plus two module-level helpers,
`_validate_meta` and `_validate_and_prepare_edgelist`, that enforce the
column requirements with descriptive error messages.

---

## 2. Bundled C. elegans dataset

**Change.** A small *C. elegans* connectome (300 neurons, 3,539 chemical
edges, 20,672 synapses) ships with the package and is exposed as:

```python
from InfluenceCalculator.data import celegans_edgelist, celegans_meta
edges = celegans_edgelist()   # pre, post, count, norm
meta  = celegans_meta()       # root_id, top_nt, super_class, neuron_class, body_part
```

**Why.** Tests and examples need a real connectome they can build a
calculator from without external downloads. The previous toy
`tests/toy_network_example.sqlite` was opaque and not documented; the
*C. elegans* graph is small, public, and well-annotated, which lets the
worked example double as a tutorial against a famous connectome.

**Files.** `InfluenceCalculator/data/__init__.py` (new — uses
`importlib.resources.files()` so it works whether installed or run
in-tree); `InfluenceCalculator/data/celegans_edgelist.csv` and
`InfluenceCalculator/data/celegans_meta.csv` (new); `pyproject.toml`
gains a `[tool.setuptools.package-data]` entry so the CSVs are
included in the wheel.

> **Provenance.** The data was taken from the
> [OpenWorm project](https://openworm.org/) distribution of the
> *C. elegans* hermaphrodite chemical connectome (accessed
> February 2026), which aggregates the original electron-microscopy
> reconstructions of White et al. 1986 and Cook et al. 2019. The
> README "Data source" section carries the full citation block;
> downstream users redistributing the bundled CSVs should cite both
> primary sources and the OpenWorm aggregation.

---

## 3. Module-level `adjust_influence`

**Change.** A new function `adjust_influence(df, const=24, signif=6)` is
exported alongside `InfluenceCalculator`. It takes the DataFrame
returned by `calculate_influence`, groups by `(target, seed)`, sums
within each group, and returns three columns:

- `adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)`
- `adjusted_influence_norm_by_targets`
- `adjusted_influence_norm_by_sources_and_targets`

**Why.** Raw influence scores span many orders of magnitude — the
strongest direct paths can be ten billion times larger than the
weakest distal trickle — so a linear-scaled heatmap shows the top of
the distribution and nothing else. `adjust_influence` does three
things to make the output legible:

- **`log(...)`** compresses the dynamic range so weak and strong paths
  are visible side by side.
- **`max(|x|, exp(-const))`** is a junk-node floor: anything weaker
  than `exp(-const)` is treated as "essentially zero", which keeps
  `log(0)` from producing `-inf` and stops the colormap getting
  hijacked by numerical noise.
- **`+ const`** then shifts everything so the smallest meaningful
  score sits at exactly 0 and a colormap can be anchored there
  without losing sign information.

Mirrors `adjust_influence` in the R sibling package
[`natverse/influencer`](https://github.com/natverse/influencer).

**Files.** `InfluenceCalculator/InfluenceCalculator.py` (function
defined at module scope); `InfluenceCalculator/__init__.py` (exported
in `__all__`); `tests/test_influence_calculator.py`
(`test_adjust_influence_basics`, `test_adjust_influence_threshold_floor`,
`test_adjust_influence_preserves_sign`).

---

## 4. Sign-preserving signed mode

**Change.** When `signed=True` is set, the influence DataFrame now
returns the real part of the steady-state vector rather than its
magnitude — so a target dominated by inhibitory paths gets a negative
`Influence_score_(signed)`. `adjust_influence` propagates the sign
through the log transform.

**Why.** Without sign preservation, "signed" mode silently degenerated
to unsigned: every output was a non-negative magnitude, and the only
remaining effect of GABA-pre-neuron negation was to reduce magnitudes
where positive and negative paths cancelled. That cancellation is real
but undetectable from the output, which makes the signed flag
indistinguishable from a perturbed unsigned run.

While auditing this we also found a **pre-existing bug** in
`_create_sparse_W`. To make `signed=True` produce negative weights for
inhibitory pre-neurons, the original code multiplied the relevant rows
of the edgelist's `count` column by `-1`. But the matrix `W` is
populated from the `norm` column (the fraction `count / sum_per_post`),
not from `count`. Flipping the sign of `count` left `norm` positive, so
every entry of `W` ended up positive regardless of how `signed=` was
set — the signed setting silently built the same matrix as the unsigned
setting. The fix multiplies whichever column is actually used to build
`W` (held in the `syn_weight_measure` variable, which defaults to
`norm`), so the negation now reaches the matrix.

**Files.** `InfluenceCalculator/InfluenceCalculator.py` —
`_build_influence_dataframe` now branches on `self.W_signed`;
`_create_sparse_W` negates `syn_weight_measure` instead of `'count'`.

---

## 5. Externalised neurotransmitter assignment (`inhibitory_nts`, `excluded_nts`)

**Change.** The old hardcoded `NEG_NEUROTRANSMITTERS = {'glutamate',
'gaba', 'serotonin', 'octopamine'}` constant has been removed. Two
keyword arguments now pass through the constructor and every classmethod:

- `inhibitory_nts={'gaba', ...}` — `top_nt` values whose pre-neurons
  receive a negative sign in `signed=True` mode.
- `excluded_nts={'dopamine', 'serotonin', ...}` — `top_nt` values whose
  pre-neurons contribute zero outgoing weight (their columns of `W` are
  empty), in either signed or unsigned mode.

**Why.**

1. *Species-dependence.* Glutamate is net excitatory in mammals and
   most *Drosophila* circuits but is dual-action in *C. elegans*
   (GluCl chloride channels make it inhibitory at e.g. AWC→AIY).
   Hardcoding the sign in the library forces every user into the
   same biological assumption.
2. *Receptor-mix uncertainty.* Modulators like dopamine, serotonin,
   and octopamine can be excitatory or inhibitory at a given target
   depending on the receptor mix. `excluded_nts` lets users silence
   pre-neurons whose net effect cannot be assigned a single sign,
   rather than forcing a wrong one.
3. *Library hygiene.* Per a direct user instruction: "the library
   should not pre-empt how the user wishes to assign transmitters."
   Sets are still demonstrated in the README and worked example as
   defaults users can copy-paste.

**Files.** `InfluenceCalculator/InfluenceCalculator.py` (constants
removed; new kwargs threaded through `__init__` and all `from_*`
classmethods; validation raises `ValueError` if `signed=True` is set
without `inhibitory_nts` or if `excluded_nts` is used without a
`meta_df` containing `top_nt`); `tests/test_influence_calculator.py`
gains `test_excluded_nts_removes_edges` and
`test_excluded_nts_requires_top_nt`.

---

## 6. Exposed `lambda_max` as a documented parameter

**Change.** The target largest real eigenvalue of the rescaled
connectivity matrix W̃, previously hardcoded to `0.99` inside
`_normalize_W`, is now a constructor argument:

```python
ic = InfluenceCalculator.from_dataframes(edges, meta, lambda_max=0.5)
```

Default remains `0.99` for backwards compatibility.

`_normalize_W` now *always* rescales to `lambda_max` exactly (rather
than only capping when the natural eigenvalue exceeds it), so the
parameter is a true control knob over leading-mode amplification
rather than a stability ceiling.

**Why.** Think of `lambda_max` as a **reverb knob** on the
network. Near 1, a signal injected at the seed echoes around the
graph many times before fading — the gain along the dominant
recurrent loop is `1/(1-lambda_max)`, so `100×` at `0.99` versus
`2×` at `0.5`. Crank it to the max and the dominant loop drowns
out finer differences between targets: every column of the heatmap
ends up with nearly the same shape. Turn it down and the signal
mostly travels along short paths, exposing per-target specificity
at the cost of attenuating long polysynaptic effects.

Which value is "right" depends on the connectome. The default
`0.99` is calibrated for a whole-CNS *Drosophila* graph (BANC-scale,
~130k neurons), where you want maximum sensitivity to weak distal
influence. On a small graph like the *C. elegans* connectome the
same setting puts the leading mode in charge of the entire heatmap;
`0.5` is a more useful starting point. The point of exposing the
parameter is that this is a knob users should be turning, not a
hidden constant.

**Trade-off shown by the worked example sweep** (28 canonical
sensory→interneuron pairs from the *C. elegans* literature, ranked
within each seed's column; mean column std as a leading-mode
dominance proxy):

| `lambda_max` | canonical mean rank-frac | mean col std (info) |
|--------------|--------------------------|---------------------|
| 0.10 | 0.931 | 0.160 |
| 0.30 | 0.929 | 0.162 |
| 0.50 | 0.932 | 0.149 |
| 0.70 | 0.933 | 0.131 |
| 0.90 | 0.937 | 0.105 |
| 0.99 | 0.921 | **0.042** |

Canonical-pair scores barely move (the strongest direct paths win at
any λ), but column differentiation collapses by ~4× between λ=0.90
and λ=0.99 — the leading-mode dominance signature. The worked example
defaults to `lambda_max=0.5` as the balance: canonical hits intact,
columns clearly differentiated, some polysynaptic integration retained.

**Files.** `InfluenceCalculator/InfluenceCalculator.py` (parameter
added and validated on every constructor; `_normalize_W` reads
`self.lambda_max`); `examples/celegans_worked_example.py` (defines
`LAMBDA_MAX = 0.5` and surfaces it in the heatmap title).

---

## 7. Worked example — `examples/celegans_worked_example.py`

**Change.** A self-contained script that loads the bundled *C. elegans*
graph, computes per-seed influence from every sensory neuron (83
seeds, summed into 46 cell classes after collapsing bilateral pairs)
onto every non-sensory target (187 → 136 cell classes), log-adjusts
the per-(target_class, seed_class) raw scores via `adjust_influence`,
and renders two heatmaps in `docs/images/`:

- `influence_heatmap_unsigned.png` (sequential greyscale, [0, max])
- `influence_heatmap_signed.png` (diverging blue→white→red, [−bound, +bound])

The seed and target axes are grouped by `body_part` (body / head / tail;
the pharyngeal nervous system is excluded as it is essentially isolated
from the rest of the graph) and clustered within each group by
average-linkage hierarchical clustering. The matrix is transposed so
seed classes index the rows.

Bilateral pairs are summed into cell classes (`AVAL/AVAR → AVA`,
`AVDL/AVDR → AVD`, `IL2DL/IL2DR → IL2D`) on both axes via a regex
that strips the trailing L/R only when it follows a capital letter
(deliberately *not* including `DL|DR|VL|VR` as alternatives —
Python's leftmost-first alternation would otherwise turn `AVDL` into
`AV` instead of `AVD`). The matrix shows the **raw** adjusted_influence
values directly with no per-row min-max rescaling; with
`lambda_max = 0.5` the leading mode is damped enough that per-target
seed specificity is already legible, so a min-max normalisation step
is unnecessary.

**Why.** A connectome library without a worked example is hard to
evaluate. The `C. elegans` example is small enough to run in seconds,
recognised enough that the resulting heatmap can be eyeballed against
the literature (sensory → command-interneuron paths, body-touch →
ventral cord motor blocks, phasmid → AVA/AVD), and structured enough
to demonstrate the full library API end-to-end.

**`const` auto-calibration.** Rather than hardcoding `const=24`, the
example computes `const = -log(min_nonzero |raw|)` over the per-row
influence scores so the smallest non-zero magnitude maps exactly to 0
after the log transform, eliminating an arbitrary floor and adapting
cleanly to different `lambda_max` choices.

**Files.** `examples/celegans_worked_example.py` (new);
`docs/images/influence_heatmap_unsigned.png` and
`docs/images/influence_heatmap_signed.png` (regenerated each run).

---

## 8. Tests overhaul

**Change.** `tests/test_influence_calculator.py` is rewritten as
discrete pytest functions, fed by `tests/conftest.py` fixtures that
use `importlib.resources.as_file()` to expose the bundled CSVs as
filesystem paths. New tests:

- `test_format_equivalence` — `from_dataframes`, `from_csv`, and
  `from_numpy` agree on neuron count, matrix size, and ID universe.
- `test_adjust_influence_basics` / `_threshold_floor` /
  `_preserves_sign` — covers the new module-level transform.
- `test_input_validation_missing_columns` / `_signed_no_top_nt`.
- `test_excluded_nts_removes_edges` / `_requires_top_nt`.
- `test_norm_auto_computation` — `'norm'` is computed when absent.
- `test_round_trip_smoke` — full pipeline from CSV → `calculate_influence`
  → `adjust_influence`, gated on PETSc/SLEPc availability.

PETSc/SLEPc-dependent tests use `pytest.importorskip` so the suite is
runnable in environments without those libraries.

**Files.** `tests/conftest.py` (new); `tests/test_influence_calculator.py`
(rewritten); `pyproject.toml` (`[tool.pytest.ini_options]` plus a
`test` extra).

---

## 9. `pyproject.toml` modernisation

**Change.** Bumped `setuptools >= 77` (so the SPDX-string
`license = "BSD-3-Clause"` syntax from PEP 639 is accepted), set
`requires-python >= 3.10`, version `0.2.0`, declared optional extras
(`parquet`, `test`, `examples`, `dev`), added a `package-data` block
for the bundled CSVs, and a `[tool.pytest.ini_options]` section
pointing to `tests/`.

**Why.** Pre-existing setup couldn't install on a machine with newer
setuptools that emit warnings about the dual-purpose `license` field;
the new form is the documented PEP 639 spelling. The optional extras
mean a CI image can install just what it needs (`pip install .[test]`)
rather than every parquet dependency.

**Files.** `pyproject.toml`.

---

## 10. README — restructured around the new knobs

**Change.** The README is reorganised so the three things a user
actually tunes — `inhibitory_nts` / `excluded_nts`, `lambda_max`, and
`const` for `adjust_influence` — each have one canonical home:

- The **Description** section now derives `W̃ = (λ / λ_max(W)) · W`
  with `λ` as a tuneable target (the `lambda_max` argument), and
  retains the explicit gloss *"where λ_max(W) is the largest real
  eigenvalue of W, and λ is the desired largest real eigenvalue of
  W̃"* matching the original phrasing. It carries both the
  technical explanation (gain = `1/(1-lambda_max)` along the leading
  recurrent mode) and a short "reverb knob" metaphor: high
  `lambda_max` makes the network echo signals through long indirect
  paths, low `lambda_max` keeps the signal local. Includes inline
  species guidance — `0.99` *(seems appropriate for the whole-CNS
  Drosophila BANC connectome and larger graphs)*, near `0.5` *(more
  appropriate for the C. elegans connectome, where the graph is
  small enough that the leading mode otherwise washes the heatmap
  out)*.

- The **"How W is filled"** sentence in the Description was
  rewritten to make the input-normalisation explicit. The original
  said the matrix is "filled with the number of synaptic connections
  that a presynaptic neuron projects onto a postsynaptic neuron",
  which described `syn_weight_measure='count'` rather than the
  actual default (`'norm'`). The new wording makes clear that each
  entry is *the fraction of a postsynaptic neuron's total drive
  that comes from a given upstream partner* and explains the
  biological rationale (per-edge weights need to be comparable
  across neurons that vary widely in size and total input count).

- A new **"`adjust_influence`: log-compression and grouping"** section
  explains why the function exists (raw scores span ten orders of
  magnitude), the `const` floor as a junk-node cutoff, and the
  difference between the three output columns
  (`adjusted_influence` vs the two normalised variants) — borrowing
  framing from the R sibling package's documentation. Includes the
  `adjusted_influence_vs_traversal.jpg` figure with an expanded
  caption that defines the x-axis (graph-traversal depth = mean
  number of synaptic hops in shortest-path BFS) and reads off the
  intuition: each polysynaptic step costs ≈ 1.3 units of
  `adjusted_influence`, so the score maps directly onto effective
  polysynaptic distance.

- A new **"Worked example: *C. elegans* connectome"** section
  embeds the two regenerated heatmaps and shows a minimum-viable
  end-to-end snippet. A short knobs table cross-references the
  Description and `adjust_influence` sections rather than
  re-explaining each parameter. Detailed biology (the *Drosophila*
  vs *C. elegans* NT-set comparison, the cholinergic-fraction
  callout that explains the wide blank band on the signed heatmap's
  seed axis) lives as comments in
  `examples/celegans_worked_example.py` rather than in the README.

- A short **"Data source"** subsection attributes the bundled CSVs
  to the OpenWorm project distribution (accessed February 2026)
  with prose citations to White et al. 1986 and Cook et al. 2019.
  Full BibTeX lives in the docstring of
  `InfluenceCalculator/data/__init__.py` (so `help(.data)` surfaces
  it) rather than cluttering the README.

- The **BANC Dataset** section now lists, alongside the existing
  Dataverse DOI, the lab's public Google Cloud Storage path for the
  Feather-formatted edge list
  (`gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather`),
  and notes that it loads directly through `from_feather`.

- The **Usage** section now also lists the alternative constructors
  (`from_dataframes`, `from_csv`, `from_parquet`, `from_feather`,
  `from_numpy`) alongside the original SQLite path, names the
  required edgelist columns (`pre`, `post`, `count` or `weight`,
  optional `norm`) and metadata columns (`root_id`, plus `top_nt`
  when `signed=True` or `excluded_nts` is set), and explicitly
  states that **missing columns raise a `ValueError` that names the
  required columns and lists the columns the user actually passed**
  — fail-fast with an actionable message rather than a silent bad
  result.

- A one-line cross-link to [`natverse/influencer`](https://github.com/natverse/influencer)
  appears at the top of the Description section.

- Four images are embedded inline:

  | image | location | role |
  |---|---|---|
  | `seed_to_targets_diagram.jpg` | top of Description | conceptual schematic of source → targets propagation |
  | `linear_dynamical_model.png` | next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) |
  | `neural_network_dynamics.gif` | after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source `.mp4` via two-pass palette `ffmpeg`, source deleted) |
  | `adjusted_influence_vs_traversal.jpg` | in the `adjust_influence` section | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) |

  The `seed_to_targets` and `adjusted_influence_vs_traversal` images are pulled
  from the R sibling package
  [`natverse/influencer`](https://github.com/natverse/influencer);
  `linear_dynamical_model.png` and `neural_network_dynamics.gif`
  are bespoke for this repo.

**Files.** `README.md`;
`InfluenceCalculator/data/__init__.py` (BibTeX moved into module
docstring); `examples/celegans_worked_example.py` (Drosophila /
*C. elegans* NT comparison + cholinergic-fraction comment absorbed);
`docs/images/seed_to_targets_diagram.jpg`,
`docs/images/linear_dynamical_model.png`,
`docs/images/neural_network_dynamics.gif`,
`docs/images/adjusted_influence_vs_traversal.jpg` (new).

---

## 11. `.gitignore`

**Change.** Added `__pycache__/`, `.pytest_cache/`, `.venv/`,
`Influence/` (test-output directory), and `CLAUDE.md` (working notes,
not for distribution). `.DS_Store` was already there.

**Files.** `.gitignore`.

---

## Files affected — index

| File | Status | Summary |
|------|--------|---------|
| `InfluenceCalculator/InfluenceCalculator.py` | modified | constructors, `adjust_influence`, sign preservation, `lambda_max`, NT externalisation, signed-mode bug fix |
| `InfluenceCalculator/__init__.py` | modified | export `adjust_influence` |
| `InfluenceCalculator/data/__init__.py` | new | `celegans_edgelist()`, `celegans_meta()`; module docstring carries OpenWorm + White 1986 + Cook 2019 BibTeX |
| `InfluenceCalculator/data/celegans_edgelist.csv` | new | bundled edge list |
| `InfluenceCalculator/data/celegans_meta.csv` | new | bundled metadata |
| `examples/celegans_worked_example.py` | new | worked example, generates heatmaps |
| `docs/images/influence_heatmap_unsigned.png` | new | example output (regenerated) |
| `docs/images/influence_heatmap_signed.png` | new | example output (regenerated) |
| `docs/images/seed_to_targets_diagram.jpg` | new | source → targets schematic (Description) |
| `docs/images/linear_dynamical_model.png` | new | annotated linear-dynamics ODE (Description) |
| `docs/images/neural_network_dynamics.gif` | new | propagation-to-steady-state animation (Description) |
| `docs/images/adjusted_influence_vs_traversal.jpg` | new | adjusted_influence vs graph-traversal depth (`adjust_influence` section) |
| `tests/conftest.py` | new | importlib.resources fixtures |
| `tests/test_influence_calculator.py` | rewritten | 11 discrete pytest functions |
| `pyproject.toml` | modified | setuptools≥77, extras, package-data, pytest config |
| `.gitignore` | modified | cache and working-note ignores |
| `README.md` | modified | `lambda_max` reverb-knob explanation in Description; `adjust_influence` section; worked-example section; data citations; `natverse/influencer` cross-link |
| `update.md` | new | this document |
@alexanderbates
Copy link
Copy Markdown
Author

Splitting per your request into 6 sequential PRs (#5#10); closing this in favour of those. #5 is the first one - please review there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant