Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61
Add patient_cohort_recruitment template (CSP, Graph + Rules + Prescriptive, multi-axis cohort coverage)#61
Conversation
Three-pillar Graph + Rules + CSP cohort discovery: Graph reasoner closes a kinase-pathway sub-ontology in one reachable call; relational rules lift the closure to per-patient eligibility (kinase mutation + therapy/AE pair within 90 days) and per-axis coverage facts; the CSP solver picks K patients whose joint coverage hits MIN_GENES / MIN_THERAPIES / MIN_AES floors, scoping is_covered decisions to coverable rows so the upper-bound ICs actually bind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Boolean / Integer indicator-property pattern (e.g.
Patient.is_eligible, Gene.is_kinase_member, Gene.is_coverable) with the
extends=[Parent] sub-concept idiom. EligiblePatient, KinaseGene, and the
CoverableGene/Therapy/AdverseEvent triples now carry the predicate via
membership; downstream rules and solve_for filters say e.g.
where=[EligiblePatient(Patient)] instead of where=[X.is_eligible == 1].
Cheaper (no indicator property table), reads cleanly, and inherits the
parent's id/properties.
Also: drop unnecessary refs in covers_* rules (TherapyEvent /
AdverseEventOcc directly), fold t_days into Concept.new(), call-form
binding (MutationEvent.id(mut_data.id)), rename KineRootGene ->
KinaseRootGene, drop the dead genes_csv_data walrus, add the
termination_status() == "OPTIMAL" assertion after verify(), and update
docstring + README + expected-output to match. Live run still OPTIMAL
in ~0.08s with cohort {P_Alpha, P_Charlie, P_Delta, P_Echo} covering all
4 leaf kinase genes, 3 therapies, 3 AEs.
Note: solve_for(SubConcept.prop, ...) on a parent-declared property
currently fails with an FDError on duplicate variable names. MRE saved
locally at /tmp/pyrel_solve_for_subtype_mre.py for a follow-up Jira
against the prescriptive lib; in the meantime the template uses the
where=[Sub(Parent)] form, which works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MutationEvent.patient/.gene, TherapyEvent.patient/.therapy, and AdverseEventOcc.patient/.term are all functional foreign keys -- each event observation links to exactly one patient and one entity (gene, therapy, or AE term). model.Property is the correct declaration. GeneIsA.parent/.child stay as model.Relationship because they are consumed by the Graph constructor's edge_src_relationship/ edge_dst_relationship parameters, which require Relationship. Patient.covers_kinase_gene/.covers_therapy/.covers_ae also stay as Relationship -- a patient covers many genes/therapies/AEs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each GeneIsA edge has exactly one parent gene and one child gene, so model.Property is the correct declaration. Previously kept as model.Relationship out of caution that the Graph constructor's edge_src_relationship/edge_dst_relationship parameters might reject Property -- a probe (/tmp/skill_probes/probe_graph_property_edge_v2.py) ran reachable() against identical graphs declared both ways and got identical 7 reachability rows in each case, so Property works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # v1/README.md
Bumps relationalai pin to 1.1.1 and rewrites the prescriptive section to use `solve_for(EligiblePatient.is_in_cohort)` / `solve_for(CoverableGene.is_covered)` etc. directly, so each binary decision is created per sub-concept row without the previous `where=[Sub(Parent)]` scoping. All ICs and inspect queries follow the same convention (`sum(EligiblePatient.is_in_cohort)`, `CoverableGene.is_covered <= sum(EligiblePatient.is_in_cohort).per(CoverableGene)`, etc.) so the model definition lines up with where decisions actually live. Updates the docstring, README, and the example cohort output to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"query" implied a read-only retrieval, but the template solves a constraint problem -- selecting patients to enrol so the cohort jointly covers a coverage threshold across kinase genes, therapies, and adverse-event terms. "Recruitment" describes that constructive shape and matches the clinical-researcher domain language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mise drift - Rename module-level data path to uppercase DATA_DIR to match the recent CSP cart templates. - Add a Solve result block to the Expected output, keyed to the bundled live-run. - Fix Customise / Troubleshooting examples to aggregate over the sub-concepts (CoverableGene / CoverableTherapy / CoverableAdverseEvent / EligiblePatient) the decisions are scoped to. The earlier examples referenced parent properties, which would trigger a TypeError per the sub-concept-and-aggregate rule the rest of the README is built around. - Update the Troubleshooting closure-print snippet to a runnable inspect() form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coverage, lift qualifying-pair to 3-arity, add pre-solve invariants - CoverableGene/Therapy/AdverseEvent now derive from EligiblePatient.covers_* (not Patient.covers_*). A Y covered only by ineligible patients would otherwise sit in Coverable* with no upper-bound IC binding and the solver could mark it covered for free. Bundled fixture didn't exercise this gap; a customer dataset could. - Patient.qualifying_pair is now a 3-arity relationship over (Patient, TherapyEvent, AdverseEventOcc) triples. The AE-window predicate lives in this single rule; QualifyingPairPatient, Patient.covers_therapy, and Patient.covers_ae are one-line projections from it. Previously the 4-conjunct join was duplicated across three rule bodies. - Add Python pre-solve invariants for duplicate keys, dangling foreign keys, missing kinase root, and negative t_days. Catches the most common silent-failure modes when a customer swaps in their own CSVs. - README updated to describe the new design and the eligible-coverage scoping pitfall.
…rose
Validator hardening:
- Add `_assert_no_nulls` helper called from every validator; NaN
values in required columns now raise a focused ValueError instead
of cascading into a confusing pandas/CPython `int(NaN)` traceback.
- Rename `_assert_nonneg_t_days` to `_assert_nonneg_column` and
parameterize the column name; same call, less hardcoded.
- Consolidate the eight foreign-key calls into a single declarative
`_FK_EDGES` table iterated in a `for` loop; one place to edit when
the schema changes.
- Format duplicate-key error message consistently for single and
composite keys (`(id)` and `(child_id, parent_id)` rather than
`id` and `('child_id', 'parent_id')`).
Documentation polish:
- Trim the front-matter description from 399 to 280 chars and drop
Python identifier names (`MIN_GENES`, etc.) that don't read well
on a catalog tile; mirror the same trim in v1/README.md index.
- Rewrite the expected-output narrative to be solver-agnostic: the
bundled data has multiple feasible cohorts, so any specific claim
about which leaf genes are covered is false for some valid runs.
- Restructure the "A coverable Y appears as is_covered = 1"
troubleshooting entry from a single dense paragraph into two
named pitfalls with separate fix recipes.
- US English throughout: enrol→enroll, enrolment→enrollment,
generalises→generalizes, recognisable→recognizable,
materialised→materialized, optimisation→optimization (3x).
- Fix Graph constructor parameter names in prose (`edge_src_relationship`
/ `edge_dst_relationship`, not `src_relationship` / `dst_relationship`).
- Use the catalog convention `Rules-based` rather than `Rules` in
reasoning_types front-matter to match sister templates.
… lower bounds
The set-cover formulation correctly enforced the MIN_* coverage floors,
but the per-axis `Y.is_covered` indicators were only upper-bounded by
the count of covering in-cohort eligible patients. In a satisfaction
solve, any subset of the truly-covered Ys that hits the floor was a
valid assignment -- so the solver was free to leave additional
genuinely-covered indicators at 0. The downstream `inspect()` output
("kinase-pathway genes covered by the cohort", etc.) could then
underreport the cohort's actual coverage.
Add per-pair lower-bound ICs `Y.is_covered >= EligiblePatient.is_in_cohort`
for each (eligible patient, Y) pair where the patient covers Y. With
both bounds, `Y.is_covered` is pinned to the actual coverage truth-value:
1 iff some in-cohort eligible patient covers Y. The floor IC
`sum(is_covered) >= MIN_*` is then a constraint on the true coverage,
not on a free-floating indicator subset.
ICs grow from 7 to 10 (cohort size + 3 upper + 3 lower + 3 floor); all
ten are pure relational arithmetic and re-evaluable by `problem.verify()`.
README "How it works" rewritten to explain the saturation pattern;
module docstring updated for the IC count and lower-bound rationale.
|
The docs preview for this pull request has been deployed to Vercel!
|
Aligns with the canonical pin used by product_configurator, synthetic_eligibility_records, and synthetic_order_lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the customer-facing reasoner taxonomy and the language other multi-reasoner templates use. "Pillar" was internal jargon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cafzal
left a comment
There was a problem hiding this comment.
Ship with nits. Sub-concept-as-predicate is genuinely novel in v1 — telco_network_recovery uses is_critical_restore as a Boolean Relationship marker, this template uses EligiblePatient extends Patient with solve_for(Sub.prop) keying decisions only to rows the rules established as meaningful (patient_cohort_recruitment.py:470-489). Eligible-coverage scoping verified by reproduction (lines 444-449, 505-518): BRAF mutated by ineligibles 10/14 but stays coverable because eligibles 1/4/7 also carry it. Per-pair coverage saturation LB (:528-541) genuinely pins is_covered so inspect() cannot underreport. Pre-solve invariants reusable (driven by an _FK_EDGES table). Closure exactly {1..7}; floors (3,2,2) non-trivially feasible; MAX(4,3,3) correctly unreachable. Distinct lesson vs. telco — both are Graph + Rules + CSP, no overlap in encoding lessons.
Issues (all NITs)
README.md:50-51— "10 genes ... 26 mutation events" missing a "bundled sample" / "illustrative" qualifier. Combined withP_Alpha...P_Oscarpatient names, the demo-ness is implicit but should be explicit (per global no-PII / Demo-framing rule). Add one sentence in "What's included" or front-matter description.README.md:25— "aEligiblePatient" should be "an" (vowel-sound rule). Article slip recurs in "How it works".README.md:159— "fail at least one of: kinase-mutation, qualifying-pair within the 90-day window" reads like a 3-item list but is 2 items. Drop "at least one of" or reword to "fail either the kinase-mutation test or the qualifying-pair test".patient_cohort_recruitment.py:421— dropped word: "...demonstrate a qualifying response pattern for are counted." Suggest "Only therapies and AEs with a within-window qualifying pair are counted."README.md:264— section name "Customize this template" drifts from sample-template's "Customize".README.md:269— gold parenthetical about the parent/sub-concept aggregation TypeError is buried at the end of a long bullet. Lift to its own bullet or move to Troubleshooting.README.md:23— long opening sentence buries the lede; split for pacing.
py_compile and ruff check clean.
- Call out bundled CSVs as illustrative synthetic demo data - Split long opening paragraph for pacing - Reword the eligibility-fail sentence as a clean either/or - Lift sub-concept aggregation guidance into its own bullet - Fix dropped wording in the covers_therapy / covers_ae comment Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What this template adds
A clinical-research cohort-selection template that composes three reasoners over a patient knowledge graph:
reachable(full=True)call, materializing every pathway gene as aKinaseGene extends Genesub-concept.KinaseMutationCarrier,QualifyingPairPatient,EligiblePatient) and per-axis coverage facts (Patient.covers_kinase_gene,Patient.covers_therapy,Patient.covers_ae).MIN_GENES_COVEREDdistinct kinase genes,MIN_THERAPIES_COVEREDdistinct therapies, andMIN_AES_COVEREDdistinct adverse events. MiniZinc / Chuffed backend.Modeling patterns this surfaces
EligiblePatient extends PatientandCoverableGene extends Genemake membership the predicate; downstream rules and the CSP just checkSub(Parent).solve_forscoped to sub-concepts. Decisions are created only on rows the rules established as meaningful — ineligible patients and never-covered Ys never get a decision, and the upper-bound ICs cleanly bind on the rows that do.Coverable*. A Y covered only by ineligible patients would otherwise sit inCoverable*with no upper-bound IC binding and the solver could mark it covered for free. Scoping toEligiblePatient.covers_*closes that gap structurally.Patient.qualifying_pairrelationship over(Patient, TherapyEvent, AdverseEventOcc)triples single-sources the AE-window predicate.QualifyingPairPatient,Patient.covers_therapy, andPatient.covers_aeare one-line projections.Y.is_covered <= sum(in_cohort).per(Y)plus per-pair lower boundY.is_covered >= EligiblePatient.is_in_cohortpinis_coveredto the actual coverage truth-value, so the floor ICsum(is_covered) >= MIN_*constrains true coverage and the inspect() output cannot underreport.Verification
ValueErrorfor null/duplicate keys, dangling foreign keys, missing kinase root, and negative timestamps. Foreign-key edges declared in a single declarative table.problem.verify()— re-evaluates all 10 ICs (cohort size + 3 upper + 3 lower + 3 floor) in the returned solution. Every IC is pure relational arithmetic.termination_status() == "OPTIMAL"assertion viamodel.require.(MIN_GENES=3, MIN_THERAPIES=2, MIN_AES=2)floors.References
The "How it works" section walks through each pillar with code excerpts. The Customize section covers swap-your-own-data, alternative cohort objectives (max coverage, min average age), tightening the qualifying window, additional eligibility conjuncts, and per-stratum fairness constraints. Troubleshooting covers INFEASIBLE, multi-cohort non-determinism, and the two encoding pitfalls (sub-concept target + eligible-coverage scoping) that produce the "is_covered=1 with no covering patient" symptom.