Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage) by chriscoey · Pull Request #61 · RelationalAI/templates

chriscoey · 2026-05-08T04:23:43Z

What this template adds

A clinical-research cohort-selection template that composes three reasoners over a patient knowledge graph:

Graph reasoner — closes a kinase-pathway sub-ontology in one reachable(full=True) call, materializing every pathway gene as a KinaseGene extends Gene sub-concept.
Rules — pure-relational lifting from the closure to per-patient eligibility (KinaseMutationCarrier, QualifyingPairPatient, EligiblePatient) and per-axis coverage facts (Patient.covers_kinase_gene, Patient.covers_therapy, Patient.covers_ae).
Prescriptive (CSP) — picks K patients whose joint coverage spans at least MIN_GENES_COVERED distinct kinase genes, MIN_THERAPIES_COVERED distinct therapies, and MIN_AES_COVERED distinct adverse events. MiniZinc / Chuffed backend.

Modeling patterns this surfaces

Sub-concept predicate markers, not Boolean indicator properties. EligiblePatient extends Patient and CoverableGene extends Gene make membership the predicate; downstream rules and the CSP just check Sub(Parent).
solve_for scoped to sub-concepts. Decisions are created only on rows the rules established as meaningful — ineligible patients and never-covered Ys never get a decision, and the upper-bound ICs cleanly bind on the rows that do.
Eligible-coverage scoping for Coverable*. A Y covered only by ineligible patients would otherwise sit in Coverable* with no upper-bound IC binding and the solver could mark it covered for free. Scoping to EligiblePatient.covers_* closes that gap structurally.
3-arity Patient.qualifying_pair relationship over (Patient, TherapyEvent, AdverseEventOcc) triples single-sources the AE-window predicate. QualifyingPairPatient, Patient.covers_therapy, and Patient.covers_ae are one-line projections.
Per-pair coverage saturation. Upper bound Y.is_covered <= sum(in_cohort).per(Y) plus per-pair lower bound Y.is_covered >= EligiblePatient.is_in_cohort pin is_covered to the actual coverage truth-value, so the floor IC sum(is_covered) >= MIN_* constrains true coverage and the inspect() output cannot underreport.

Verification

Pre-solve invariants — Python helpers raise focused ValueError for null/duplicate keys, dangling foreign keys, missing kinase root, and negative timestamps. Foreign-key edges declared in a single declarative table.
Post-solve problem.verify() — re-evaluates all 10 ICs (cohort size + 3 upper + 3 lower + 3 floor) in the returned solution. Every IC is pure relational arithmetic.
termination_status() == "OPTIMAL" assertion via model.require.
Bundled fixture: 15 patients, 26 mutations, 12 AEs, 7 kinase-pathway genes (root + 2 internal + 4 leaves). 8 patients are eligible; the solver returns one of several feasible cohorts that hit the (MIN_GENES=3, MIN_THERAPIES=2, MIN_AES=2) floors.

References

The "How it works" section walks through each pillar with code excerpts. The Customize section covers swap-your-own-data, alternative cohort objectives (max coverage, min average age), tightening the qualifying window, additional eligibility conjuncts, and per-stratum fairness constraints. Troubleshooting covers INFEASIBLE, multi-cohort non-determinism, and the two encoding pitfalls (sub-concept target + eligible-coverage scoping) that produce the "is_covered=1 with no covering patient" symptom.

Three-pillar Graph + Rules + CSP cohort discovery: Graph reasoner closes a kinase-pathway sub-ontology in one reachable call; relational rules lift the closure to per-patient eligibility (kinase mutation + therapy/AE pair within 90 days) and per-axis coverage facts; the CSP solver picks K patients whose joint coverage hits MIN_GENES / MIN_THERAPIES / MIN_AES floors, scoping is_covered decisions to coverable rows so the upper-bound ICs actually bind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the Boolean / Integer indicator-property pattern (e.g. Patient.is_eligible, Gene.is_kinase_member, Gene.is_coverable) with the extends=[Parent] sub-concept idiom. EligiblePatient, KinaseGene, and the CoverableGene/Therapy/AdverseEvent triples now carry the predicate via membership; downstream rules and solve_for filters say e.g. where=[EligiblePatient(Patient)] instead of where=[X.is_eligible == 1]. Cheaper (no indicator property table), reads cleanly, and inherits the parent's id/properties. Also: drop unnecessary refs in covers_* rules (TherapyEvent / AdverseEventOcc directly), fold t_days into Concept.new(), call-form binding (MutationEvent.id(mut_data.id)), rename KineRootGene -> KinaseRootGene, drop the dead genes_csv_data walrus, add the termination_status() == "OPTIMAL" assertion after verify(), and update docstring + README + expected-output to match. Live run still OPTIMAL in ~0.08s with cohort {P_Alpha, P_Charlie, P_Delta, P_Echo} covering all 4 leaf kinase genes, 3 therapies, 3 AEs. Note: solve_for(SubConcept.prop, ...) on a parent-declared property currently fails with an FDError on duplicate variable names. MRE saved locally at /tmp/pyrel_solve_for_subtype_mre.py for a follow-up Jira against the prescriptive lib; in the meantime the template uses the where=[Sub(Parent)] form, which works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MutationEvent.patient/.gene, TherapyEvent.patient/.therapy, and AdverseEventOcc.patient/.term are all functional foreign keys -- each event observation links to exactly one patient and one entity (gene, therapy, or AE term). model.Property is the correct declaration. GeneIsA.parent/.child stay as model.Relationship because they are consumed by the Graph constructor's edge_src_relationship/ edge_dst_relationship parameters, which require Relationship. Patient.covers_kinase_gene/.covers_therapy/.covers_ae also stay as Relationship -- a patient covers many genes/therapies/AEs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each GeneIsA edge has exactly one parent gene and one child gene, so model.Property is the correct declaration. Previously kept as model.Relationship out of caution that the Graph constructor's edge_src_relationship/edge_dst_relationship parameters might reject Property -- a probe (/tmp/skill_probes/probe_graph_property_edge_v2.py) ran reachable() against identical graphs declared both ways and got identical 7 reachability rows in each case, so Property works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # v1/README.md

Bumps relationalai pin to 1.1.1 and rewrites the prescriptive section to use `solve_for(EligiblePatient.is_in_cohort)` / `solve_for(CoverableGene.is_covered)` etc. directly, so each binary decision is created per sub-concept row without the previous `where=[Sub(Parent)]` scoping. All ICs and inspect queries follow the same convention (`sum(EligiblePatient.is_in_cohort)`, `CoverableGene.is_covered <= sum(EligiblePatient.is_in_cohort).per(CoverableGene)`, etc.) so the model definition lines up with where decisions actually live. Updates the docstring, README, and the example cohort output to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

"query" implied a read-only retrieval, but the template solves a constraint problem -- selecting patients to enrol so the cohort jointly covers a coverage threshold across kinase genes, therapies, and adverse-event terms. "Recruitment" describes that constructive shape and matches the clinical-researcher domain language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mise drift - Rename module-level data path to uppercase DATA_DIR to match the recent CSP cart templates. - Add a Solve result block to the Expected output, keyed to the bundled live-run. - Fix Customise / Troubleshooting examples to aggregate over the sub-concepts (CoverableGene / CoverableTherapy / CoverableAdverseEvent / EligiblePatient) the decisions are scoped to. The earlier examples referenced parent properties, which would trigger a TypeError per the sub-concept-and-aggregate rule the rest of the README is built around. - Update the Troubleshooting closure-print snippet to a runnable inspect() form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…coverage, lift qualifying-pair to 3-arity, add pre-solve invariants - CoverableGene/Therapy/AdverseEvent now derive from EligiblePatient.covers_* (not Patient.covers_*). A Y covered only by ineligible patients would otherwise sit in Coverable* with no upper-bound IC binding and the solver could mark it covered for free. Bundled fixture didn't exercise this gap; a customer dataset could. - Patient.qualifying_pair is now a 3-arity relationship over (Patient, TherapyEvent, AdverseEventOcc) triples. The AE-window predicate lives in this single rule; QualifyingPairPatient, Patient.covers_therapy, and Patient.covers_ae are one-line projections from it. Previously the 4-conjunct join was duplicated across three rule bodies. - Add Python pre-solve invariants for duplicate keys, dangling foreign keys, missing kinase root, and negative t_days. Catches the most common silent-failure modes when a customer swaps in their own CSVs. - README updated to describe the new design and the eligible-coverage scoping pitfall.

…rose Validator hardening: - Add `_assert_no_nulls` helper called from every validator; NaN values in required columns now raise a focused ValueError instead of cascading into a confusing pandas/CPython `int(NaN)` traceback. - Rename `_assert_nonneg_t_days` to `_assert_nonneg_column` and parameterize the column name; same call, less hardcoded. - Consolidate the eight foreign-key calls into a single declarative `_FK_EDGES` table iterated in a `for` loop; one place to edit when the schema changes. - Format duplicate-key error message consistently for single and composite keys (`(id)` and `(child_id, parent_id)` rather than `id` and `('child_id', 'parent_id')`). Documentation polish: - Trim the front-matter description from 399 to 280 chars and drop Python identifier names (`MIN_GENES`, etc.) that don't read well on a catalog tile; mirror the same trim in v1/README.md index. - Rewrite the expected-output narrative to be solver-agnostic: the bundled data has multiple feasible cohorts, so any specific claim about which leaf genes are covered is false for some valid runs. - Restructure the "A coverable Y appears as is_covered = 1" troubleshooting entry from a single dense paragraph into two named pitfalls with separate fix recipes. - US English throughout: enrol→enroll, enrolment→enrollment, generalises→generalizes, recognisable→recognizable, materialised→materialized, optimisation→optimization (3x). - Fix Graph constructor parameter names in prose (`edge_src_relationship` / `edge_dst_relationship`, not `src_relationship` / `dst_relationship`). - Use the catalog convention `Rules-based` rather than `Rules` in reasoning_types front-matter to match sister templates.

… lower bounds The set-cover formulation correctly enforced the MIN_* coverage floors, but the per-axis `Y.is_covered` indicators were only upper-bounded by the count of covering in-cohort eligible patients. In a satisfaction solve, any subset of the truly-covered Ys that hits the floor was a valid assignment -- so the solver was free to leave additional genuinely-covered indicators at 0. The downstream `inspect()` output ("kinase-pathway genes covered by the cohort", etc.) could then underreport the cohort's actual coverage. Add per-pair lower-bound ICs `Y.is_covered >= EligiblePatient.is_in_cohort` for each (eligible patient, Y) pair where the patient covers Y. With both bounds, `Y.is_covered` is pinned to the actual coverage truth-value: 1 iff some in-cohort eligible patient covers Y. The floor IC `sum(is_covered) >= MIN_*` is then a constraint on the true coverage, not on a free-floating indicator subset. ICs grow from 7 to 10 (cohort size + 3 upper + 3 lower + 3 floor); all ten are pure relational arithmetic and re-evaluable by `problem.verify()`. README "How it works" rewritten to explain the saturation pattern; module docstring updated for the IC count and lower-bound rationale.

github-actions · 2026-05-08T04:37:47Z

The docs preview for this pull request has been deployed to Vercel!

✅ Preview:	https://relationalai-docs-7t12wfiov-relationalai.vercel.app/build/templates
🔍 Inspect:	https://vercel.com/relationalai/relationalai-docs/DLU37RTjWcNgsRhMmbRHkY7iYMRC

Aligns with the canonical pin used by product_configurator, synthetic_eligibility_records, and synthetic_order_lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Match the customer-facing reasoner taxonomy and the language other multi-reasoner templates use. "Pillar" was internal jargon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cafzal

Ship with nits. Sub-concept-as-predicate is genuinely novel in v1 — telco_network_recovery uses is_critical_restore as a Boolean Relationship marker, this template uses EligiblePatient extends Patient with solve_for(Sub.prop) keying decisions only to rows the rules established as meaningful (patient_cohort_recruitment.py:470-489). Eligible-coverage scoping verified by reproduction (lines 444-449, 505-518): BRAF mutated by ineligibles 10/14 but stays coverable because eligibles 1/4/7 also carry it. Per-pair coverage saturation LB (:528-541) genuinely pins is_covered so inspect() cannot underreport. Pre-solve invariants reusable (driven by an _FK_EDGES table). Closure exactly {1..7}; floors (3,2,2) non-trivially feasible; MAX(4,3,3) correctly unreachable. Distinct lesson vs. telco — both are Graph + Rules + CSP, no overlap in encoding lessons.

Issues (all NITs)

README.md:50-51 — "10 genes ... 26 mutation events" missing a "bundled sample" / "illustrative" qualifier. Combined with P_Alpha...P_Oscar patient names, the demo-ness is implicit but should be explicit (per global no-PII / Demo-framing rule). Add one sentence in "What's included" or front-matter description.
README.md:25 — "a EligiblePatient" should be "an" (vowel-sound rule). Article slip recurs in "How it works".
README.md:159 — "fail at least one of: kinase-mutation, qualifying-pair within the 90-day window" reads like a 3-item list but is 2 items. Drop "at least one of" or reword to "fail either the kinase-mutation test or the qualifying-pair test".
patient_cohort_recruitment.py:421 — dropped word: "...demonstrate a qualifying response pattern for are counted." Suggest "Only therapies and AEs with a within-window qualifying pair are counted."
README.md:264 — section name "Customize this template" drifts from sample-template's "Customize".
README.md:269 — gold parenthetical about the parent/sub-concept aggregation TypeError is buried at the end of a long bullet. Lift to its own bullet or move to Troubleshooting.
README.md:23 — long opening sentence buries the lede; split for pacing.

py_compile and ruff check clean.

- Call out bundled CSVs as illustrative synthetic demo data - Split long opening paragraph for pacing - Reword the eligibility-fail sentence as a clean either/or - Lift sub-concept aggregation guidance into its own bullet - Fix dropped wording in the covers_therapy / covers_ae comment Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chriscoey and others added 12 commits April 28, 2026 23:18

Merge remote-tracking branch 'origin/main' into csp-patient_cohort_query

8318505

# Conflicts: # v1/README.md

Merge remote-tracking branch 'origin/main' into csp-patient_cohort_query

2309bf6

github-actions Bot temporarily deployed to Preview May 8, 2026 04:24 Inactive

github-actions Bot added the deployed label May 8, 2026

Pin relationalai==1.1.0 and pandas>=2.0 to match other v1 templates

d471b90

Aligns with the canonical pin used by product_configurator, synthetic_eligibility_records, and synthetic_order_lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot temporarily deployed to Preview May 8, 2026 04:56 Inactive

chriscoey marked this pull request as ready for review May 8, 2026 16:29

chriscoey requested review from jablonskidev and somacdivad as code owners May 8, 2026 16:29

Replace "pillar" with "reasoner"/"stage" in customer-facing prose

f4947ea

Match the customer-facing reasoner taxonomy and the language other multi-reasoner templates use. "Pillar" was internal jargon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot temporarily deployed to Preview May 8, 2026 17:33 Inactive

cafzal approved these changes May 8, 2026

View reviewed changes

github-actions Bot temporarily deployed to Preview May 8, 2026 18:59 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61

Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61
chriscoey wants to merge 15 commits intomainfrom
csp-patient_cohort_recruitment

chriscoey commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

cafzal left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chriscoey commented May 8, 2026

What this template adds

Modeling patterns this surfaces

Verification

References

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cafzal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 8, 2026 •

edited

Loading