Skip to content

Commit 5be0a01

Browse files
docs: design for restricted diagrams (#865, #1110)
Graph-driven cascade delete using restricted Diagram nodes, replacing error-message parsing with dependency graph traversal. Addresses MySQL 8 privilege issues and PostgreSQL overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4a7e1e8 commit 5be0a01

File tree

1 file changed

+292
-0
lines changed

1 file changed

+292
-0
lines changed

docs/design/restricted-diagram.md

Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# Design: Restricted Diagrams for Cascading Operations
2+
3+
**Issues:** [#865](https://github.com/datajoint/datajoint-python/issues/865), [#1110](https://github.com/datajoint/datajoint-python/issues/1110)
4+
5+
**Branch:** `design/restricted-diagram`
6+
7+
## Problem
8+
9+
### #1110 — Cascade delete fails on MySQL 8 with insufficient privileges
10+
11+
DataJoint's cascade delete works by trial-and-error: attempt `DELETE` on the parent, catch the FK integrity error, **parse the MySQL error message** to discover which child table is blocking, then recursively delete from that child first.
12+
13+
MySQL error 1451 (`ROW_IS_REFERENCED_2`) includes the child table name and constraint details. But MySQL 8 returns error 1217 (`ROW_IS_REFERENCED`) instead when the user lacks certain privileges (`CREATE VIEW`, `SHOW VIEW`, `INDEX`, `TRIGGER`). Error 1217 provides no table name — just *"Cannot delete or update a parent row: a foreign key constraint fails"* — so the cascade crashes with `AttributeError: 'NoneType' object has no attribute 'groupdict'`.
14+
15+
Additional problems with the error-driven approach:
16+
17+
- **PostgreSQL overhead**: PostgreSQL aborts the entire transaction on any error. Each failed delete attempt requires `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips.
18+
- **Fragile parsing**: Different MySQL versions and privilege levels produce different error message formats.
19+
- **Opaque failures**: When parsing fails, the error message gives no actionable guidance.
20+
21+
### #865 — Applying restrictions to a Diagram
22+
23+
DataJoint needs a general-purpose way to specify a subset of data across multiple tables for delete, export, backup, and sharing. `dj.Diagram` already provides powerful set operators for specifying subsets of *tables*. Adding per-node restrictions would complete the functionality for specifying cross-sections of *data*.
24+
25+
## Observation
26+
27+
**`drop()` already uses the graph-driven approach.** The cascading drop walks the dependency graph in reverse topological order, dropping leaves first:
28+
29+
```python
30+
# Current Table.drop() implementation
31+
self.connection.dependencies.load()
32+
tables = [t for t in self.connection.dependencies.descendants(self.full_table_name)
33+
if not t.isdigit()]
34+
for table in reversed(tables):
35+
FreeTable(self.connection, table).drop_quick()
36+
```
37+
38+
No error parsing, no trial-and-error. The same pattern can be applied to cascade delete, with the addition of **restriction propagation** through FK attribute mappings.
39+
40+
## Design
41+
42+
### Core concept: `RestrictedDiagram`
43+
44+
A `RestrictedDiagram` is a `Diagram` augmented with per-node restrictions. Applying a restriction to one node propagates it downstream through FK edges, using the `attr_map` stored on each edge.
45+
46+
```python
47+
# Apply restriction to Session node, propagate to all descendants
48+
rd = dj.Diagram(schema).restrict(Session & 'subject_id=1')
49+
50+
# Preview what would be affected
51+
rd.preview()
52+
53+
# Execute cascading delete
54+
rd.delete()
55+
56+
# Or export the restricted cross-section
57+
rd.export('/path/to/backup/')
58+
```
59+
60+
### Restriction propagation
61+
62+
Each node in the `RestrictedDiagram` carries a list of restrictions (combined with OR for multiple FK paths from different parents).
63+
64+
**Propagation rules for edge `Parent → Child` with `attr_map`:**
65+
66+
1. **Non-aliased FK** (`attr_map` is identity, e.g. `{'mouse_id': 'mouse_id'}`):
67+
If the parent's restriction attributes are a subset of the child's primary key, copy the restriction directly. Otherwise, restrict child by `parent.proj()`.
68+
69+
2. **Aliased FK** (`attr_map` renames, e.g. `{'source_mouse': 'mouse_id'}`):
70+
Restrict child by `parent.proj(**{fk: pk for fk, pk in attr_map.items()})`.
71+
72+
3. **Multiple FK paths to the same child** (via alias nodes):
73+
Each path produces a separate restriction. These combine with OR — a child row must be deleted if it references restricted parent rows through *any* FK.
74+
75+
This reuses the existing restriction logic from the current `cascade()` function (lines 1082–1090 of `table.py`), but applies it upfront during graph traversal rather than reactively from error messages.
76+
77+
### `part_integrity` as a Diagram-level policy
78+
79+
Currently, `part_integrity` is a parameter on `Table.delete()` with three modes:
80+
81+
| Mode | Behavior |
82+
|------|----------|
83+
| `"enforce"` | Error if parts would be deleted without their masters |
84+
| `"ignore"` | Allow deleting parts without masters (breaks integrity) |
85+
| `"cascade"` | Also delete masters when parts are deleted |
86+
87+
In the restricted diagram design, `part_integrity` becomes a policy on the diagram's restriction propagation rather than a post-hoc check:
88+
89+
**`"enforce"` (default):** During propagation, if a restriction reaches a part table but its master is not in the diagram or is unrestricted, raise an error *before* any deletes execute. This is strictly better than the current approach, which executes all deletes within a transaction and only checks *after* the cascade completes.
90+
91+
**`"ignore"`:** Propagate restrictions normally. Parts may be deleted without their masters.
92+
93+
**`"cascade"`:** During propagation, when a restriction reaches a part table whose master is not already restricted, propagate the restriction *upward* from part to master: `master &= (master.proj() & restricted_part.proj())`. Then continue propagating the master's restriction to *its* descendants. This replaces the current ad-hoc upward cascade in lines 1086–1108 of `table.py`.
94+
95+
```python
96+
# part_integrity becomes a diagram policy
97+
rd = dj.Diagram(schema).restrict(
98+
PartTable & 'key=1',
99+
part_integrity="cascade"
100+
)
101+
# Master is now also restricted to rows matching the part restriction
102+
```
103+
104+
### `Part.delete()` integration
105+
106+
The current `Part.delete()` override (in `user_tables.py:219`) gates access based on `part_integrity` before delegating to `Table.delete()`. With the diagram approach, this becomes:
107+
108+
- `Part.delete(part_integrity="enforce")` — raises error (unchanged)
109+
- `Part.delete(part_integrity="ignore")` — creates a single-node diagram for the part, deletes directly
110+
- `Part.delete(part_integrity="cascade")` — creates a diagram from the part, propagates restriction upward to master, then executes the full diagram delete
111+
112+
### Graph traversal for delete
113+
114+
```python
115+
def delete(self):
116+
"""Execute cascading delete using the restricted diagram."""
117+
conn = self._connection
118+
conn.dependencies.load()
119+
120+
# Get all restricted nodes in reverse topological order (leaves first)
121+
tables = [t for t in self.topo_sort() if not t.isdigit() and self._restrictions.get(t)]
122+
123+
with conn.transaction:
124+
for table_name in reversed(tables):
125+
ft = FreeTable(conn, table_name)
126+
ft._restriction = self._restrictions[table_name]
127+
ft.delete_quick()
128+
```
129+
130+
No `IntegrityError` catching, no error message parsing, no savepoints. Deletes proceed in dependency order — leaves first, parents last — so FK constraints are never violated.
131+
132+
### Handling unloaded/inaccessible schemas
133+
134+
If a child table lives in a schema not loaded into the dependency graph, the graph-driven delete won't know about it. The final parent `delete_quick()` would then fail with an FK error.
135+
136+
**Strategy:** After the graph-driven delete completes, wrap in a single try/except:
137+
138+
```python
139+
try:
140+
# graph-driven delete (as above)
141+
except IntegrityError as error:
142+
match = conn.adapter.parse_foreign_key_error(error.args[0])
143+
if match:
144+
raise DataJointError(
145+
f"Delete blocked by table {match['child']} in an unloaded schema. "
146+
f"Activate all dependent schemas before deleting."
147+
) from None
148+
else:
149+
raise DataJointError(
150+
"Delete blocked by a foreign key in an unloaded or inaccessible schema. "
151+
"Activate all dependent schemas, or ensure sufficient database privileges."
152+
) from None
153+
```
154+
155+
This preserves error-message parsing as a **diagnostic fallback** rather than as the primary cascade mechanism. The error is actionable: the user knows to activate the missing schema.
156+
157+
### Alias node handling
158+
159+
The dependency graph uses numeric alias nodes (`"1"`, `"2"`, ...) to represent aliased FKs while keeping the graph acyclic. During restriction propagation:
160+
161+
1. Walk `out_edges(parent)` — this yields edges to both real tables and alias nodes.
162+
2. For alias nodes: read the `attr_map` from the `parent → alias` edge, then follow `alias → child` to find the real child table.
163+
3. Accumulate restrictions per real child table. Multiple paths (alias + direct) to the same child produce OR-combined restrictions.
164+
165+
```python
166+
def _propagate_restriction(self, parent_name, parent_restriction):
167+
"""Propagate restriction from parent to all children via FK edges."""
168+
for _, target, edge_data in self.out_edges(parent_name, data=True):
169+
attr_map = edge_data["attr_map"]
170+
171+
# Follow through alias node to real child
172+
if target.isdigit():
173+
alias_node = target
174+
real_children = list(self.successors(alias_node))
175+
child_name = real_children[0] if real_children else None
176+
else:
177+
child_name = target
178+
179+
if child_name is None:
180+
continue
181+
182+
# Compute child restriction using attr_map
183+
parent_expr = FreeTable(self._connection, parent_name)
184+
parent_expr._restriction = parent_restriction
185+
186+
if edge_data["aliased"]:
187+
child_restriction = parent_expr.proj(
188+
**{fk: pk for fk, pk in attr_map.items()}
189+
)
190+
else:
191+
child_restriction = parent_expr.proj()
192+
193+
# Accumulate as OR (list = OR in DataJoint restriction semantics)
194+
self._restrictions.setdefault(child_name, [])
195+
self._restrictions[child_name].append(child_restriction)
196+
```
197+
198+
### API
199+
200+
```python
201+
# From a table with restriction
202+
rd = dj.Diagram(Session & 'subject_id=1')
203+
204+
# Explicit restrict call
205+
rd = dj.Diagram(schema).restrict(Session & 'subject_id=1')
206+
207+
# Operator syntax (proposed in #865)
208+
rd = dj.Diagram(schema) & (Session & 'subject_id=1')
209+
210+
# Multiple restrictions
211+
rd = dj.Diagram(schema) & (Session & 'subject_id=1') & (Lab & 'lab="brody"')
212+
213+
# With part_integrity policy
214+
rd = dj.Diagram(schema) & (PartTable & 'key=1')
215+
rd.delete(part_integrity="cascade")
216+
217+
# Preview before executing
218+
rd.preview() # show affected tables and row counts
219+
rd.draw() # visualize with restricted nodes highlighted
220+
221+
# Other operations
222+
rd.delete()
223+
rd.export(path) # future: #864, #560
224+
```
225+
226+
## Advantages over current approach
227+
228+
| | Current (error-driven) | Proposed (graph-driven) |
229+
|---|---|---|
230+
| MySQL 8 + limited privileges | Crashes (#1110) | Works — no error parsing needed |
231+
| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
232+
| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
233+
| part_integrity enforcement | Post-hoc check after delete | Pre-check before any delete |
234+
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
235+
| Reusability | Delete-only | Delete, export, backup, sharing |
236+
| Inspectability | Opaque recursive cascade | Preview affected data before executing |
237+
238+
## Implementation plan
239+
240+
### Phase 1: RestrictedDiagram core
241+
242+
1. Add `_restrictions: dict[str, list]` to `Diagram` — per-node restriction storage
243+
2. Implement `_propagate_restriction()` — walk edges, compute child restrictions via `attr_map`
244+
3. Implement `restrict(table_expr)` — entry point: extract table name + restriction, propagate
245+
4. Implement `__and__` operator — syntax sugar for `restrict()`
246+
5. Handle alias nodes during propagation
247+
6. Handle `part_integrity` during propagation (upward cascade from part to master)
248+
249+
### Phase 2: Graph-driven delete
250+
251+
1. Implement `Diagram.delete()` — reverse topo order, `delete_quick()` at each node
252+
2. Add unloaded-schema fallback error handling
253+
3. Migrate `Table.delete()` to construct a `RestrictedDiagram` internally
254+
4. Preserve `Part.delete()` behavior with diagram-based `part_integrity`
255+
5. Remove error-message parsing from the critical path (retain as diagnostic fallback)
256+
257+
### Phase 3: Preview and visualization
258+
259+
1. `Diagram.preview()` — show restricted nodes with row counts
260+
2. `Diagram.draw()` — highlight restricted nodes, show restriction labels
261+
262+
### Phase 4: Export and backup (future, #864/#560)
263+
264+
1. `Diagram.export(path)` — forward topo order, fetch + write at each node
265+
2. `Diagram.restore(path)` — forward topo order, insert at each node
266+
267+
## Files affected
268+
269+
| File | Change |
270+
|------|--------|
271+
| `src/datajoint/diagram.py` | Add `_restrictions`, `restrict()`, `__and__`, `_propagate_restriction()`, `delete()`, `preview()` |
272+
| `src/datajoint/table.py` | Rewrite `Table.delete()` to use `RestrictedDiagram` internally |
273+
| `src/datajoint/user_tables.py` | Update `Part.delete()` to use diagram-based part_integrity |
274+
| `src/datajoint/dependencies.py` | Possibly add helper methods for edge traversal with attr_map |
275+
| `tests/integration/test_cascading_delete.py` | Update tests, add graph-driven cascade tests |
276+
| `tests/integration/test_diagram.py` | New tests for restricted diagram |
277+
278+
## Open questions
279+
280+
1. **Should `Diagram & restriction` return a new `RestrictedDiagram` subclass or augment `Diagram` in place?**
281+
A new subclass keeps the existing `Diagram` (visualization) clean. But the restriction machinery is intimately tied to the graph structure, suggesting in-place augmentation.
282+
283+
2. **Upward propagation scope for `part_integrity="cascade"`:**
284+
When a restriction propagates up from part to master, should the master's restriction then propagate to the master's *other* parts and descendants? The current implementation does this (lines 1098–1108 of `table.py`). The diagram approach would naturally do the same — restricting the master triggers downstream propagation to all its children.
285+
286+
3. **Transaction boundaries:**
287+
The current `Table.delete()` wraps everything in a single transaction with user confirmation. The diagram-based delete should preserve this: build the restricted diagram (read-only), show preview, get confirmation, then execute all deletes in one transaction.
288+
289+
4. **Lazy vs eager restriction propagation:**
290+
Eager: propagate all restrictions when `restrict()` is called (computes row counts immediately).
291+
Lazy: store parent restrictions and propagate during `delete()`/`export()` (defers queries).
292+
Eager is better for preview but may issue many queries upfront. Lazy is more efficient when the user just wants to delete without preview. Consider lazy propagation with eager option for preview.

0 commit comments

Comments
 (0)