Skip to content

Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61

Open
chriscoey wants to merge 15 commits intomainfrom
csp-patient_cohort_recruitment
Open

Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61
chriscoey wants to merge 15 commits intomainfrom
csp-patient_cohort_recruitment

Conversation

@chriscoey
Copy link
Copy Markdown
Member

What this template adds

A clinical-research cohort-selection template that composes three reasoners over a patient knowledge graph:

  • Graph reasoner — closes a kinase-pathway sub-ontology in one reachable(full=True) call, materializing every pathway gene as a KinaseGene extends Gene sub-concept.
  • Rules — pure-relational lifting from the closure to per-patient eligibility (KinaseMutationCarrier, QualifyingPairPatient, EligiblePatient) and per-axis coverage facts (Patient.covers_kinase_gene, Patient.covers_therapy, Patient.covers_ae).
  • Prescriptive (CSP) — picks K patients whose joint coverage spans at least MIN_GENES_COVERED distinct kinase genes, MIN_THERAPIES_COVERED distinct therapies, and MIN_AES_COVERED distinct adverse events. MiniZinc / Chuffed backend.

Modeling patterns this surfaces

  • Sub-concept predicate markers, not Boolean indicator properties. EligiblePatient extends Patient and CoverableGene extends Gene make membership the predicate; downstream rules and the CSP just check Sub(Parent).
  • solve_for scoped to sub-concepts. Decisions are created only on rows the rules established as meaningful — ineligible patients and never-covered Ys never get a decision, and the upper-bound ICs cleanly bind on the rows that do.
  • Eligible-coverage scoping for Coverable*. A Y covered only by ineligible patients would otherwise sit in Coverable* with no upper-bound IC binding and the solver could mark it covered for free. Scoping to EligiblePatient.covers_* closes that gap structurally.
  • 3-arity Patient.qualifying_pair relationship over (Patient, TherapyEvent, AdverseEventOcc) triples single-sources the AE-window predicate. QualifyingPairPatient, Patient.covers_therapy, and Patient.covers_ae are one-line projections.
  • Per-pair coverage saturation. Upper bound Y.is_covered <= sum(in_cohort).per(Y) plus per-pair lower bound Y.is_covered >= EligiblePatient.is_in_cohort pin is_covered to the actual coverage truth-value, so the floor IC sum(is_covered) >= MIN_* constrains true coverage and the inspect() output cannot underreport.

Verification

  • Pre-solve invariants — Python helpers raise focused ValueError for null/duplicate keys, dangling foreign keys, missing kinase root, and negative timestamps. Foreign-key edges declared in a single declarative table.
  • Post-solve problem.verify() — re-evaluates all 10 ICs (cohort size + 3 upper + 3 lower + 3 floor) in the returned solution. Every IC is pure relational arithmetic.
  • termination_status() == "OPTIMAL" assertion via model.require.
  • Bundled fixture: 15 patients, 26 mutations, 12 AEs, 7 kinase-pathway genes (root + 2 internal + 4 leaves). 8 patients are eligible; the solver returns one of several feasible cohorts that hit the (MIN_GENES=3, MIN_THERAPIES=2, MIN_AES=2) floors.

References

The "How it works" section walks through each pillar with code excerpts. The Customize section covers swap-your-own-data, alternative cohort objectives (max coverage, min average age), tightening the qualifying window, additional eligibility conjuncts, and per-stratum fairness constraints. Troubleshooting covers INFEASIBLE, multi-cohort non-determinism, and the two encoding pitfalls (sub-concept target + eligible-coverage scoping) that produce the "is_covered=1 with no covering patient" symptom.

chriscoey and others added 12 commits April 28, 2026 23:18
Three-pillar Graph + Rules + CSP cohort discovery: Graph reasoner closes a
kinase-pathway sub-ontology in one reachable call; relational rules lift
the closure to per-patient eligibility (kinase mutation + therapy/AE pair
within 90 days) and per-axis coverage facts; the CSP solver picks K
patients whose joint coverage hits MIN_GENES / MIN_THERAPIES / MIN_AES
floors, scoping is_covered decisions to coverable rows so the upper-bound
ICs actually bind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Boolean / Integer indicator-property pattern (e.g.
Patient.is_eligible, Gene.is_kinase_member, Gene.is_coverable) with the
extends=[Parent] sub-concept idiom. EligiblePatient, KinaseGene, and the
CoverableGene/Therapy/AdverseEvent triples now carry the predicate via
membership; downstream rules and solve_for filters say e.g.
where=[EligiblePatient(Patient)] instead of where=[X.is_eligible == 1].
Cheaper (no indicator property table), reads cleanly, and inherits the
parent's id/properties.

Also: drop unnecessary refs in covers_* rules (TherapyEvent /
AdverseEventOcc directly), fold t_days into Concept.new(), call-form
binding (MutationEvent.id(mut_data.id)), rename KineRootGene ->
KinaseRootGene, drop the dead genes_csv_data walrus, add the
termination_status() == "OPTIMAL" assertion after verify(), and update
docstring + README + expected-output to match. Live run still OPTIMAL
in ~0.08s with cohort {P_Alpha, P_Charlie, P_Delta, P_Echo} covering all
4 leaf kinase genes, 3 therapies, 3 AEs.

Note: solve_for(SubConcept.prop, ...) on a parent-declared property
currently fails with an FDError on duplicate variable names. MRE saved
locally at /tmp/pyrel_solve_for_subtype_mre.py for a follow-up Jira
against the prescriptive lib; in the meantime the template uses the
where=[Sub(Parent)] form, which works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MutationEvent.patient/.gene, TherapyEvent.patient/.therapy, and
AdverseEventOcc.patient/.term are all functional foreign keys -- each
event observation links to exactly one patient and one entity (gene,
therapy, or AE term). model.Property is the correct declaration.

GeneIsA.parent/.child stay as model.Relationship because they are
consumed by the Graph constructor's edge_src_relationship/
edge_dst_relationship parameters, which require Relationship.
Patient.covers_kinase_gene/.covers_therapy/.covers_ae also stay as
Relationship -- a patient covers many genes/therapies/AEs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each GeneIsA edge has exactly one parent gene and one child gene, so
model.Property is the correct declaration. Previously kept as
model.Relationship out of caution that the Graph constructor's
edge_src_relationship/edge_dst_relationship parameters might reject
Property -- a probe (/tmp/skill_probes/probe_graph_property_edge_v2.py)
ran reachable() against identical graphs declared both ways and got
identical 7 reachability rows in each case, so Property works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps relationalai pin to 1.1.1 and rewrites the prescriptive section to
use `solve_for(EligiblePatient.is_in_cohort)` / `solve_for(CoverableGene.is_covered)`
etc. directly, so each binary decision is created per sub-concept row
without the previous `where=[Sub(Parent)]` scoping. All ICs and inspect
queries follow the same convention (`sum(EligiblePatient.is_in_cohort)`,
`CoverableGene.is_covered <= sum(EligiblePatient.is_in_cohort).per(CoverableGene)`,
etc.) so the model definition lines up with where decisions actually live.

Updates the docstring, README, and the example cohort output to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"query" implied a read-only retrieval, but the template solves a
constraint problem -- selecting patients to enrol so the cohort
jointly covers a coverage threshold across kinase genes, therapies,
and adverse-event terms. "Recruitment" describes that constructive
shape and matches the clinical-researcher domain language.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mise drift

- Rename module-level data path to uppercase DATA_DIR to match the
  recent CSP cart templates.
- Add a Solve result block to the Expected output, keyed to the
  bundled live-run.
- Fix Customise / Troubleshooting examples to aggregate over the
  sub-concepts (CoverableGene / CoverableTherapy /
  CoverableAdverseEvent / EligiblePatient) the decisions are scoped
  to. The earlier examples referenced parent properties, which would
  trigger a TypeError per the sub-concept-and-aggregate rule the rest
  of the README is built around.
- Update the Troubleshooting closure-print snippet to a runnable
  inspect() form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coverage, lift qualifying-pair to 3-arity, add pre-solve invariants

- CoverableGene/Therapy/AdverseEvent now derive from
  EligiblePatient.covers_* (not Patient.covers_*). A Y covered only
  by ineligible patients would otherwise sit in Coverable* with no
  upper-bound IC binding and the solver could mark it covered for
  free. Bundled fixture didn't exercise this gap; a customer dataset
  could.
- Patient.qualifying_pair is now a 3-arity relationship over
  (Patient, TherapyEvent, AdverseEventOcc) triples. The AE-window
  predicate lives in this single rule; QualifyingPairPatient,
  Patient.covers_therapy, and Patient.covers_ae are one-line
  projections from it. Previously the 4-conjunct join was duplicated
  across three rule bodies.
- Add Python pre-solve invariants for duplicate keys, dangling
  foreign keys, missing kinase root, and negative t_days. Catches
  the most common silent-failure modes when a customer swaps in
  their own CSVs.
- README updated to describe the new design and the
  eligible-coverage scoping pitfall.
…rose

Validator hardening:
- Add `_assert_no_nulls` helper called from every validator; NaN
  values in required columns now raise a focused ValueError instead
  of cascading into a confusing pandas/CPython `int(NaN)` traceback.
- Rename `_assert_nonneg_t_days` to `_assert_nonneg_column` and
  parameterize the column name; same call, less hardcoded.
- Consolidate the eight foreign-key calls into a single declarative
  `_FK_EDGES` table iterated in a `for` loop; one place to edit when
  the schema changes.
- Format duplicate-key error message consistently for single and
  composite keys (`(id)` and `(child_id, parent_id)` rather than
  `id` and `('child_id', 'parent_id')`).

Documentation polish:
- Trim the front-matter description from 399 to 280 chars and drop
  Python identifier names (`MIN_GENES`, etc.) that don't read well
  on a catalog tile; mirror the same trim in v1/README.md index.
- Rewrite the expected-output narrative to be solver-agnostic: the
  bundled data has multiple feasible cohorts, so any specific claim
  about which leaf genes are covered is false for some valid runs.
- Restructure the "A coverable Y appears as is_covered = 1"
  troubleshooting entry from a single dense paragraph into two
  named pitfalls with separate fix recipes.
- US English throughout: enrol→enroll, enrolment→enrollment,
  generalises→generalizes, recognisable→recognizable,
  materialised→materialized, optimisation→optimization (3x).
- Fix Graph constructor parameter names in prose (`edge_src_relationship`
  / `edge_dst_relationship`, not `src_relationship` / `dst_relationship`).
- Use the catalog convention `Rules-based` rather than `Rules` in
  reasoning_types front-matter to match sister templates.
… lower bounds

The set-cover formulation correctly enforced the MIN_* coverage floors,
but the per-axis `Y.is_covered` indicators were only upper-bounded by
the count of covering in-cohort eligible patients. In a satisfaction
solve, any subset of the truly-covered Ys that hits the floor was a
valid assignment -- so the solver was free to leave additional
genuinely-covered indicators at 0. The downstream `inspect()` output
("kinase-pathway genes covered by the cohort", etc.) could then
underreport the cohort's actual coverage.

Add per-pair lower-bound ICs `Y.is_covered >= EligiblePatient.is_in_cohort`
for each (eligible patient, Y) pair where the patient covers Y. With
both bounds, `Y.is_covered` is pinned to the actual coverage truth-value:
1 iff some in-cohort eligible patient covers Y. The floor IC
`sum(is_covered) >= MIN_*` is then a constraint on the true coverage,
not on a free-floating indicator subset.

ICs grow from 7 to 10 (cohort size + 3 upper + 3 lower + 3 floor); all
ten are pure relational arithmetic and re-evaluable by `problem.verify()`.
README "How it works" rewritten to explain the saturation pattern;
module docstring updated for the IC count and lower-bound rationale.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-7t12wfiov-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/DLU37RTjWcNgsRhMmbRHkY7iYMRC

Aligns with the canonical pin used by product_configurator,
synthetic_eligibility_records, and synthetic_order_lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chriscoey chriscoey marked this pull request as ready for review May 8, 2026 16:29
Match the customer-facing reasoner taxonomy and the language other
multi-reasoner templates use. "Pillar" was internal jargon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cafzal cafzal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship with nits. Sub-concept-as-predicate is genuinely novel in v1 — telco_network_recovery uses is_critical_restore as a Boolean Relationship marker, this template uses EligiblePatient extends Patient with solve_for(Sub.prop) keying decisions only to rows the rules established as meaningful (patient_cohort_recruitment.py:470-489). Eligible-coverage scoping verified by reproduction (lines 444-449, 505-518): BRAF mutated by ineligibles 10/14 but stays coverable because eligibles 1/4/7 also carry it. Per-pair coverage saturation LB (:528-541) genuinely pins is_covered so inspect() cannot underreport. Pre-solve invariants reusable (driven by an _FK_EDGES table). Closure exactly {1..7}; floors (3,2,2) non-trivially feasible; MAX(4,3,3) correctly unreachable. Distinct lesson vs. telco — both are Graph + Rules + CSP, no overlap in encoding lessons.

Issues (all NITs)

  • README.md:50-51 — "10 genes ... 26 mutation events" missing a "bundled sample" / "illustrative" qualifier. Combined with P_Alpha...P_Oscar patient names, the demo-ness is implicit but should be explicit (per global no-PII / Demo-framing rule). Add one sentence in "What's included" or front-matter description.
  • README.md:25"a EligiblePatient" should be "an" (vowel-sound rule). Article slip recurs in "How it works".
  • README.md:159 — "fail at least one of: kinase-mutation, qualifying-pair within the 90-day window" reads like a 3-item list but is 2 items. Drop "at least one of" or reword to "fail either the kinase-mutation test or the qualifying-pair test".
  • patient_cohort_recruitment.py:421 — dropped word: "...demonstrate a qualifying response pattern for are counted." Suggest "Only therapies and AEs with a within-window qualifying pair are counted."
  • README.md:264 — section name "Customize this template" drifts from sample-template's "Customize".
  • README.md:269 — gold parenthetical about the parent/sub-concept aggregation TypeError is buried at the end of a long bullet. Lift to its own bullet or move to Troubleshooting.
  • README.md:23 — long opening sentence buries the lede; split for pacing.

py_compile and ruff check clean.

- Call out bundled CSVs as illustrative synthetic demo data
- Split long opening paragraph for pacing
- Reword the eligibility-fail sentence as a clean either/or
- Lift sub-concept aggregation guidance into its own bullet
- Fix dropped wording in the covers_therapy / covers_ae comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants