Skip to content

Add book_slate_recommendation template (Graph + Prescriptive CSP)#59

Open
chriscoey wants to merge 31 commits intomainfrom
book_slate_recommendation
Open

Add book_slate_recommendation template (Graph + Prescriptive CSP)#59
chriscoey wants to merge 31 commits intomainfrom
book_slate_recommendation

Conversation

@chriscoey
Copy link
Copy Markdown
Member

@chriscoey chriscoey commented May 8, 2026

What this template adds

A Graph + Prescriptive (CSP) recsys template that picks K books per reader from a heterogeneous knowledge graph and orders them by slate position. Slot 1 is the hero (top of row, highest engagement); position-decay is the canonical recsys engagement model, so the order matters as much as the selection. Constraints: cardinality, slot uniqueness, already-read exclusion, author uniqueness, subject concentration cap, freshness floor, in-house exposure floor, cold-start cap, hero pin, and explanation-path floor (10 ICs). Objective maximizes sum((K + 1 - slot) * path_count_total) -- the canonical position-decay engagement model.

Three things make this template distinctive:

  • Bounded paths-library walks drive the candidate set. Item.connected_to.repeat(1, MAX_HOPS).all_paths() (relationalai.semantics.std.paths) walks generate the Candidate concept and its per-(user, candidate) typed-evidence counts; Graph.triangle_count() over the book-similarity graph drives the slot-1 hero pin to a structurally-central pick.
  • Ordered slate via integer slot decisions. Each Candidate.slot ∈ {1, ..., K, K+1} where K+1 is the unpicked sentinel, so the position weight (K+1 - slot) is 0 at unpicked and no auxiliary picked-indicator is needed. The same encoding handles cardinality, position decay, and the per-pick explanation weighting in one decision variable.
  • Pure-integer CSP on MiniZinc. Per-pair count caps (GCC idiom) for slot / author / subject uniqueness, plus a per-user existential count for the hero pin. all_different would conflict with the shared K+1 sentinel.

Modeling patterns this surfaces

  • Heterogeneous-KG Item super-concept with typed sub-concepts (User / Book / Author / Subject), plus a single 2-arity Item.connected_to super-edge populated as the symmetric union of typed edges. The unified-edge layer is needed because a path() call walks one 2-arity relationship at a time.
  • Direct shared-entity joins for typed evidence (path_count_via_author, path_count_via_subject) sit alongside a true bounded-walk count (path_count_via_kg_walk). Three integer features, blended via path_count_total into both IC clauses and the objective.
  • count(...).per(c).where(...) | 0 densification so every Candidate has every typed-evidence property defined (otherwise sum-over-pick aggregates silently undercount). Arithmetic sum (a + s + w), not sum(model.union(a, s, w)) -- union inside an aggregate body deduplicates on projected values.
  • K+1 unpicked sentinel in the slot domain absorbs both the pick/unpicked partition and the engagement-decay weighting. The position weight (K+1-slot) evaluates to 0 at unpicked, K at the hero slot, and decays monotonically.
  • Per-pair count caps as PyRel's CSP-native shape for "no value repeats more than X times across decisions". count(distinct ...) is rejected by the prescriptive rewriter today; the per-pair cap form compiles to MiniZinc GCC propagation.
  • Pre-solve Python-level assertion that materialises the Candidate set, anti-joins against User.read, and refuses to solve if any user fails any of the per-IC feasibility necessary conditions. The error message lists affected users per shortfall and a strategy block keyed by which condition fired -- sparse customer data hits a clear ValueError rather than a quiet INFEASIBLE solve.
  • Real-world Open Library (CC0) bundled slice, with an in-tree --size sm|md|lg fetch script that caches under data/_cache/, atomic on write, JSON-validated on read, and process-pid-tagged so concurrent runs don't race.

Privacy

Marked private: true so it ships only on the private docs site for now -- same gating pattern used for the predictive (GNN) templates while the paths library matures.

Verification

  • Live run on relationalai==1.1.0, MiniZinc backend: status OPTIMAL, objective 648, num_points 1; problem.verify() re-evaluates all 10 ICs in the returned solution clean.
  • Pre-solve assertion fires before any model rule installs and surfaces actionable per-user shortfall lists.
  • Ruff clean; data integrity validated (FK edges, unique keys on entity tables and similarity edges, in_house ∈ {0, 1} domain check, non-negative age_days).

References

Eksombatchai et al., Pixie (WWW 2018); Wang et al., KGAT (KDD 2019); Wang et al., KPRN (AAAI 2019); Xian et al., PGPR (SIGIR 2019); Ying et al., PinSage (KDD 2018); Wang et al., K-RagRec (ACL 2025).

chriscoey and others added 16 commits April 30, 2026 12:03
Three-pillar Graph + Paths + Prescriptive (MIP) template.

Sketch state -- not yet validated end-to-end. Next step is an E2E
run against a live RAI account to debug the pipeline.

Pipeline:
- Graph: PageRank over Movie.similar_to graph -> structural prior.
- Paths: bounded explanation-path enumeration (<=3 hops) ->
  path-counts-by-type as integer features.
- Prescriptive (HiGHS MIP): K-item slate per user under genre /
  director / actor diversity, freshness floor, originals exposure
  floor, cold-start cap, explanation-path floor; objective blends
  PageRank prior with path signal.

Lead dataset: small hand-crafted sample modelled on the
MovieLens-1M-KG schema (KGAT distribution). README points at the
KGAT distribution for the realistic-instance build.

Production precedent: Pinterest Pixie (random-walk recsys at
production scale); Alibaba iGraph, eBay KPRN, LinkedIn Career
Explorer, GE Healthcare KARE; regulatory drivers (GDPR Art. 22, EU
AI Act Art. 86, ECJ C-203/22).

Plan: ~/plans/csp-templates-coverage-epic.md
"kg_aware_slate_recommendation" deferred-candidate entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Working pipeline: status OPTIMAL, objective 9779.76, all 9 ICs
verified on the bundled sample. Watched-exclusion confirmed (user 3
correctly skipped la_la_land which they had already watched).

Changes vs. the initial sketch:

- Item supertype + Item.connected_to(Item, Item) unified edge.
  Workaround for the v1.1.0 paths-lib gap on multi-edge path()
  (paths/README.md "Currently unsupported patterns" §1, design
  epic RAI-44166). Preserves real heterogeneous KG bounded walks
  (User -> Movie -> Director -> Movie ...) instead of falling back
  to a Movie.similar_to-only walk.

- PageRank stays Float (HiGHS handles float coefficients on binary
  decisions natively, same pattern as supply_chain). Dropped the
  (score * SCALE).cast(Integer) rescale that doesn't lift in 1.1.0.

- Problem(model, Float); Candidate.pick is a Float-typed bin var to
  match supply_chain's binary-on-Float-problem pattern.

- Watched-exclusion via a pick == 0 IC at the prescriptive layer
  (negation in rules not yet supported in 1.1.0; same gap
  compliance_rule_audit documents).

- User.watched ingest uses a named Integer.ref() for rating to
  avoid the unground-variable typer error.

- Slate size K reduced from 8 to 3 to fit the bundled 24-movie /
  4-user sample (production K is 8-12 with MovieLens-1M-KG).

- Renamed path_count_via_similar -> path_count_via_kg_walk to
  reflect that this count is now the heterogeneous KG-walk count,
  not a similar_to-only count.

- README pipeline section updated to document the Item supertype
  + Item.connected_to layer and call out the v1.1.0 gap workaround.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the bundled dataset and template domain from synthetic movies
to a deterministic slice of Open Library (~60 books, ~58 authors,
12 subjects). Open Library publishes its bibliographic catalogue
under CC0, so the template ships in full without licensing exposure
(MovieLens / Goodreads / Amazon-Book all carry non-commercial
clauses incompatible with shippable customer templates).

- Add data/fetch_open_library_slice.py: deterministic, cached
  fetcher with sm/md/lg size profiles. Pulls works + authors +
  subjects from the public Open Library API and emits the 10-CSV
  bundle. Synthetic users / read events / similar_to edges are
  generated on top.
- Rename Movie -> Book, Director -> Author, Genre -> Subject.
  Drop the Actor concept; the four-type heterogeneous KG (User,
  Book, Author, Subject) is plenty for KG-walks story.
- Apply the documented `| 0` default-value pattern to every
  count(...).per(c) expression so path_count_via_author /
  _via_subject / _via_kg_walk are defined for *every* Candidate,
  not just those with at least one match. Without this, the
  composite path_count_total = via_a + via_s + via_kg drops any
  Candidate missing one operand, which collapses the
  explanation-floor MIP constraint to zero and renders feasible
  problems infeasible. Confirmed via problem.display(): every user
  now has a richly-grounded explanation IC (~30 terms / user) vs
  the prior 0-1 terms.
- Final E2E: status OPTIMAL, objective 9971.95, all 8 ICs
  verified (slate-size, exclude-read, subject-diversity,
  author-uniqueness, freshness, originals, cold-start cap,
  explanation floor).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-dedup pitfall

Add experiments/count_variants.py + README probing six formulations of the
per-(user, candidate) typed path-count features. Confirms (with problem.display)
that the production form (variant A: three counts each | 0, arithmetic sum) is
correct and that sum(model.union(propA, propB, propC)) silently undercounts
under value collisions because union deduplicates on projected values.

No prescriptive ≠ pyrel divergence found: PR #1117 / #1118 / #1213 stack
already pins all observed behaviors via the iff suite (u_same_prop pins the
dedup spec; empty_body_semantics pins the cascade-drop; arithmetic_filtered
pins scope isolation across sibling aggregates).

Inline comment in the production file points future readers to the
experimental harness so the choice of arithmetic over sum-of-union is
discoverable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address findings from a multi-round review of the template:

- Utility now blends pagerank with path_count_total (was 2*via_author +
  via_subject, dropped via_kg_walk despite docs calling it the headline
  paths-pillar signal).
- Subject-distribution inspection rewritten to use aggs.sum().per(User,
  Subject) -- prior form was Cartesian over Candidate × Subject (~13740
  rows on sm; now 25 × 12 = 300).
- Honest description of MAX_HOPS=2 walker reach in module docstring,
  README pillar 2, and inline comment: User -> read_Book (length 1) +
  User -> read_Book -> similar_Book (length 2). Per-typed counts
  (via_author, via_subject) clarified as direct shared-entity joins,
  not path walks.
- README frontmatter brought in line with other v1 templates (quoted
  industry, reasoning_types block, Title-Case tags).
- Data-precondition comment block before slate_size_ic explaining the
  joint-feasibility requirements (cold users, over-read users, books
  missing author/subject, fresh/in-house floors).
- Fetcher emits WARNING summary lines when author-name resolve falls
  back to OL key tail, and when first_publish_date is synthesised.
- current_year hardcode replaced with date.today().year.
- Stale "10-CSV bundle" -> "8-CSV", "~30-40 authors" -> "~58 authors",
  "two+ subjects" -> "at least one shared subject".
- README + fetcher now correctly describe similar_to as derived (not
  synthetic) -- the GDPR Art.22 explainability framing requires the
  graph to be evidence, not fabrication.
- Removed RAI Jira IDs (RAI-44166), internal pytest node ids, and
  references to nonexistent sibling templates from customer-facing
  prose. Cleaned the same class of leak from experiments/README.md and
  experiments/count_variants.py.
- count_variants module docstring now lists six variants (was five but
  defined six); added entry for variant F.
- Fetcher error message preserves original error class for diagnosis.
- FRESH_WINDOW_DAYS comment expanded to explain catalogue-vs-streaming
  tuning. EXPLANATION_FLOOR / WEAK_EXPLANATION_THRESHOLD scaling note
  added to README customise section for md/lg slices.

E2E status: OPTIMAL on bundled sm slice; objective ~15619 (was ~9779
before utility wired path_count_total in -- expected to rise).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fixes

- Module docstring + L207 unified-edge comment now describe per-typed
  counts as the explanation surface (not "top-aggregate-relevance
  path"); explicitly state typed-evidence joins are direct shared-
  entity joins, not per-hop edge introspection.
- Fetcher: REFERENCE_YEAR = 2026 frozen constant for deterministic
  age_days across calendar years (cached reruns now produce identical
  CSVs in any year).
- Fetcher: drop works with no resolvable authors (with WARNING) so
  the runner's author-coverage precondition holds.
- README: regulatory section softened from "are required" to
  transparency-obligation framing with compliance-team caveat.
- README intro + Pipeline summary: separate reach (walker generates
  candidates) from evidence (typed-joins score them); no more "two
  reach signals" ambiguity.
- README + runner: LLM-explanation Customise bullet now references
  per-typed counts plus a documented small extension to materialise
  top paths; "eliminating hallucination" softened to "reducing
  hallucination risk".
- experiments/README.md reframed as engineering notes useful to
  advanced customers; runner comment removed internal pick_5_12
  debug coefficient name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dation

The previous name stacked two technical qualifiers ("kg_aware",
"slate"). The new name anchors to the lead instance (Open Library
books) while keeping the K-items "slate" shape that distinguishes
this template from a single-best recommendation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename module-level data path to uppercase DATA_DIR to match the
  recent CSP cart templates (synthetic_eligibility_records,
  product_configurator, synthetic_order_lifecycle).
- Rewrite Quickstart to the canonical 6-step shape (Download / venv /
  Install / Configure / Run / Expected output) and add a Solve result
  block keyed to the bundled --size sm slice.
- Document experiments/ in the template structure block so the
  ZIP layout matches the README.
- Replace remaining kg_aware_* docstring and User-Agent strings in
  data/ and experiments/ left over from the rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…add troubleshooting

- Section markers, Model() placement, docstring Run:/Output: blocks,
  per-CSV / per-concept comments, US spellings (matches recently
  merged v1 templates).
- Drop misleading "PyRel doesn't support not in rules" rationale
  comment; cite the actual prescriptive-rewriter constraint and
  the Stock.is_non_representative cart precedent in
  portfolio_balancing.
- Remove the GNN/Predictive customise mention (replaced with a
  generic "custom scoring signal" bullet) since the predictive
  reasoner is not yet public.
- Add a Troubleshooting section to the README with INFEASIBLE
  diagnostic, slow-solve, and fetcher-network details blocks.
  Expand the runner's data-preconditions block to enumerate the
  cold-start + explanation-floor joint feasibility relation and
  the author-uniqueness x slate-size interaction.
- Reorder inspection blocks so the chosen slate is printed second
  (after Users); diagnostic candidate-set and PageRank dumps move
  to the end.
- Reference fixes: drop the GE Healthcare KARE bullet (wrong
  attribution and method); correct KPRN attribution (NUS / eBay /
  USTC); disambiguate Alibaba iGraph from AliCoCo; drop dead
  Alibaba blog URL; trim the Pinterest engagement claim to its
  publication-date framing; trim the LinkedIn precedent to the
  skills graph (no platform-total numbers).
- Drop unused aggs alias; bare sum used uniformly for aggregates.
- Annotate Book.in_house as Integer 0/1 in the README schema
  block; add demo-grade tuning rationale to the utility weights.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oc fixes

- Add a pre-solve "Candidate count per user" inspect block so
  customers can spot users with zero candidates before infeasibility.
  Document the per-user IC anchoring caveat: each floor IC fires only
  for users with at least one matching Candidate row, so sparse
  customer data may pass vacuously rather than becoming infeasible.
- Open Library fetcher: pad User-Agent with a contact-stub note
  (Open Library API guidance asks for contact info), bump
  inter-request sleep from 0.2s to 1.0s to honour the documented
  unidentified rate limit, make cache writes atomic via temp-file
  rename, and treat a JSONDecodeError on cache read as a miss
  (interrupted writes or stale error bodies no longer poison
  the cache forever).
- Tighten the utility-weights comment to acknowledge that with the
  default scales, PageRank effectively serves as a tie-breaker
  rather than a co-equal blend, and point production deployments
  at min-max normalisation before applying these weights.
- Correct the K-RagRec / ItemRAG citation in the README and the
  matching mention in the runner Customise block.
- Sharpen the Troubleshooting INFEASIBLE recipe: lead with the new
  pre-solve diagnostic, note that problem.verify already prints
  per-IC violations on a failing solve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r fixups

- Add a Python-level pre-solve assertion that materialises the
  Candidate set, anti-joins against User.read, and refuses to solve
  if any user has fewer than SLATE_SIZE_K unread candidates. The
  per-user floor ICs anchor on Candidate rows, so without this guard
  customers with sparse data get a silent missing-row contract
  violation rather than an explicit infeasibility signal. The check
  also prints unread counts per user so reach can be inspected
  pre-solve.
- Fetcher cache temp file now embeds the process pid so concurrent
  fetcher runs against the same _cache directory don't race on a
  shared .tmp path.
- Replace the duplicated K-RagRec entry in the LLM+KG hybrid-pattern
  list with GraphRAG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… in-house floors

The Candidate-anchored per-user floor ICs cover three thresholds:
SLATE_SIZE_K (cardinality), FRESHNESS_FLOOR (fresh items), and
ORIGINALS_FLOOR (in-house items). Without per-floor pre-solve
checks, a user whose unread candidates contain zero fresh items
would still pass the SLATE_SIZE_K guard but produce a slate that
silently violates the freshness floor (the where(...) filter on
freshness_ic removes the user entirely from the IC's row set).

Extend the assertion to also check per-user unread fresh count
>= FRESHNESS_FLOOR and per-user unread in-house count >=
ORIGINALS_FLOOR; surface affected users in the error message.
Tighten the README troubleshooting block to describe the
assertion accurately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move CSV loads to top of file and add pre-solve invariants
  (unique-key, no-dangling-FK, non-negative age_days) using the
  _assert_* helpers + declarative FK-edges table that
  patient_cohort_recruitment establishes for v1
- Drop "Background and precedent" deep-dive from README; keep a
  short "Where this fits" framing
- Trim the academic References block to just Open Library
- Pin pandas>=2.0 to match other v1 templates
- Drop reasoning_types: Paths in favor of canonical [Graph,
  Prescriptive] vocabulary; description re-framed accordingly
- Compress essay comments around tuning constants, the path walker,
  the Item.connected_to relationship, and the Data-preconditions
  block (joint-feasibility detail already lives in README
  Troubleshooting)
- Drop the bottom "Customize-section variants" comment block (it
  duplicates the README "Customize this template" section)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three remaining cross-references still spelled "Paths" / "three-pillar"
after the front-matter switched to [Graph, Prescriptive]:

- Module docstring header and "Three-pillar pipeline" line
- "Pillar 2: Paths" section header in the .py
- "What's included" line + "Pipeline" step 2 header in the README
- v1 index description

Re-frame as "Multi-reasoner" and "Graph (bounded KG walks)" so the
labels match the canonical reasoning_types vocabulary throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-4natc25lc-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/D7ZVESkXhp32eTTWeoLyv52kg8mo

chriscoey and others added 4 commits May 7, 2026 22:28
…gle-count embeddedness floor

The old shape framed Graph + Paths + Prescriptive as three coequal
pillars but PageRank was just a per-Book Float that any retrieval-
stage scalar could substitute. That made the Graph contribution
swappable and effectively optional, and the "Paths" pillar read as a
sidecar to PageRank rather than the centerpiece.

Restructure so Paths visibly leads:

- Reorder the .py: Pillar 1 = Paths (Candidate concept + per-typed
  counts), Pillar 2 = Graph, Pillar 3 = Prescriptive. Reorder the
  README "Pipeline" section to match.
- Replace PageRank with `Graph.triangle_count()` per Book. Triangle
  count is a topological measure of where each Book sits in the
  similarity neighborhood; it cannot be supplied externally without
  reconstructing the graph, which is what makes it a Graph-pillar
  contribution rather than a data-layer input.
- Add `embeddedness_ic`: at least EMBEDDEDNESS_FLOOR picks per user
  must have triangle_count >= EMBEDDEDNESS_THRESHOLD. The Graph
  pillar now drives a structural-diversity *constraint*, not just an
  objective term.
- Drop the utility blend and the PAGERANK_WEIGHT / PATH_SIGNAL_WEIGHT
  constants. Objective collapses to
  `sum(path_count_total * pick)` -- pure path-driven, integer-only.
- Update README "Why MIP, not CSP" rationale (no longer about float
  coefficients), the "Customize this template" "Custom scoring
  signal" bullet (now framed as adding an *additive* Float term, not
  swapping out PageRank), the troubleshooting section (added the
  embeddedness-floor infeasibility cause), the constants list, and
  the v1 index entry to match.

Three Slack signals motivated picking triangle_count over Louvain
(which would have been the cleaner community-diversity story):
recent louvain() test failures locally, an open question about
re-implementing Louvain via loops in PyRel, and ambiguity about its
deprecation timeline. WCC was the other option but the bundled
similarity graph is one connected component (60/60), so a
"slate spans >= N components" IC is trivially satisfied or trivially
infeasible. Triangle count has a real per-book distribution
(0-107, two isolates, varied mid-tail) on the bundled slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/path story

- Switch solver from HiGHS MIP to MiniZinc CSP. Pure-integer model:
  binary picks, integer coefficients, no float blend.
  Problem(model, Integer) and solve("minizinc", ...).
- Update reasoning_types tag and template description to reflect CSP.
- Strengthen the graph/path narrative through three new ICs and
  a derived property:
  * subject_span_ic: each user's slate must touch
    >= MIN_DISTINCT_SUBJECTS distinct subjects, expressed via
    count(Subject, Candidate.pick == 1) -- distinct-value counting
    that sum-of-indicators cannot express directly.
  * Candidate.primary_evidence: derived integer property
    (1=author, 2=subject, 3=walker) from argmax of the three typed
    path counts. Three mutually exclusive define rules.
  * path_evidence_diversity_ic: each slate must touch >= MIN_EVIDENCE_TYPES
    distinct primary-evidence types, expressed via
    count(Integer.ref(), Candidate.pick == 1) over distinct
    primary_evidence values. CSP-native distinct-value counting.
  * strong_walker_ic: at least MIN_STRONG_WALKERS picks must have
    path_count_via_kg_walk >= STRONG_WALKER_THRESHOLD. Anchors at
    least one pick to the headline Paths-pillar signal rather than
    the cheaper shared-author / shared-subject joins.
- Drop subject_diversity_ic (subsumed by subject_span_ic + cardinality).
- Inspect output now also surfaces primary_evidence per picked item.

README, troubleshooting, and v1 index updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent-decay objective

Switch the prescriptive layer from binary pick to multi-valued integer
slot in {1..K, K+1} (K+1 = unpicked sentinel). Slot order matches the
canonical recsys position-decay engagement model -- top-of-row picks
dominate impressions -- and lets the objective directly weight items
by position via sum((K+1-slot) * path_count_total).

Pillar-2 (Graph) contribution moves from a "somewhere in the slate"
embeddedness floor to a hero-slot pin: slot 1 must come from a Book
whose triangle count clears HERO_EMBEDDEDNESS_THRESHOLD, concentrating
the structural-quality signal at the highest-engagement position.

Drop the demo-shaped extras (primary_evidence Integer property and
its 3 mutually-exclusive define rules, path_evidence_diversity_ic,
strong_walker_ic). They were added to showcase count-distinct CSP
syntax, but PyRel's prescriptive rewriter does not currently support
distinct aggregates in IC compilation, and the underlying constraints
weren't real product rules. The CSP idioms that DO exercise here:
multi-valued integer decisions, GCC-style per-pair count caps (slot,
author, subject), slot-equality reification (hero pin), and the
reified domain rule for the already-read exclusion.

Replace author_diversity_ic / subject_span_ic count-of-Concept forms
with per-pair count caps (count(Candidate, slot<=K).per(user, X) <=
N) -- the shape PyRel actually compiles. Add slot_uniqueness_ic
(count Candidates per (user, Slot.pos) <= 1) to enforce that each
slate position is filled exactly once; combined with slate_size_ic
this is a bijection between picks and positions 1..K.

Strengthen the pre-solve assertion to also flag users with no
hero-eligible candidate or fewer than K distinct unread authors.

E2E (sm slice, MiniZinc): OPTIMAL, objective 648, all ten ICs
verify clean, slate ordered 1..K per user.
…ighten docs; drop experiments dir

Pre-solve guard now catches the three remaining IC infeasibility modes
that previously surfaced as silent INFEASIBLE solves:
- cold_start: users with fewer than K - COLD_START_CAP strongly-explained candidates
- subject_span: users whose unread pool spans fewer than ceil(K / MAX_PER_SUBJECT) subjects
- explanation: users whose top-K position-weighted score upper bound is below EXPLANATION_FLOOR

ValueError repair hint is now keyed by which condition fired (densify
reach vs lower a per-IC floor vs lower SLATE_SIZE_K vs lower
EXPLANATION_FLOOR), replacing the generic "densify Book.similar_to"
hint that was misleading for short-author shortfalls.

Data-domain validations:
- in_house must be in {0, 1}; values outside silently disqualify books from the originals pool
- book_similar (src_book_id, dst_book_id) must be unique; duplicates would inflate triangle counts

Doc tightening:
- New "How this template differs from other CSP templates" README section names
  the three architectural choices that follow from encoding an ordered slate
  (unified super-edge, K+1 sentinel, per-pair count caps + per-user existential hero pin)
- Sections reordered: differences before count-idioms note (architectural orientation
  precedes the implementation caveat)
- Version-neutralized "paths-lib limitation" wording at 4 sites (was pinned to v1.1.0)
- Corrected via-author/via-subject path-count comments: counts are bag-style
  (one per join row), not distinct
- Fixed objective wording so "lower slot indices = top of row = hero" lands
  unambiguously
- Updated solve-time claim from ~1 minute to "a few seconds" for the bundled slice
- Quickstart now mentions re-running the solver after fetching a larger slice

Drop experiments/ directory (engineering scratch, was the only v1/*/experiments
across all templates; load-bearing insight retained in README).

Regenerated v1/README.md index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chriscoey chriscoey changed the title Add book_slate_recommendation template (Graph + Paths + Prescriptive MIP) Add book_slate_recommendation template (Graph + Paths + Prescriptive CSP) May 8, 2026
@chriscoey chriscoey marked this pull request as ready for review May 8, 2026 16:29
Singular form aligns with the other reasoning_types entries (Graph,
Prescriptive, Predictive, Rules-based) and with the customer-facing
class/method names (PathTraversal, model.path()). The plural-form
KG-Paths tag stays since it refers to graph paths as a data
structure rather than to the reasoner.

Updates: front-matter reasoning_types, description string, README
prose pillar headers, script docstring and pillar comment, v1 index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chriscoey chriscoey changed the title Add book_slate_recommendation template (Graph + Paths + Prescriptive CSP) Add book_slate_recommendation template (Graph + Path + Prescriptive CSP) May 8, 2026
…r-pillar framing

Per the customer-facing reasoner taxonomy, Path is taxonomically
subordinate to Graph (treated as a subset, not a peer of Graph /
Predictive / Prescriptive / Rules-based). Adjusted framing throughout
so the paths library is positioned as a load-bearing technique rather
than a separate reasoner pillar:

- Front-matter reasoning_types narrowed to [Graph, Prescriptive].
- Description reframed as "Graph + Prescriptive (CSP) recsys template
  ... bounded knowledge-graph walks via the paths library generate
  the candidate set".
- Pillar bullets in README + script docstring renamed: bounded KG
  walks (paths library, central) is the architectural centerpiece;
  Graph reasoner and Prescriptive reasoner are the two reasoner
  pillars. Code section markers renumbered Pillar 1 = Graph,
  Pillar 2 = Prescriptive.
- KG-Paths tag retained (it refers to graph paths as a data
  structure, not the reasoner taxonomy).
- v1 README index regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chriscoey chriscoey changed the title Add book_slate_recommendation template (Graph + Path + Prescriptive CSP) Add book_slate_recommendation template (Graph + Prescriptive CSP) May 8, 2026
…-internal 'first showcase' positioning from customer-facing prose

The 'Removing X collapses Y' rhetorical construction in README and
docstring added no information beyond what the surrounding bullets
already convey. Replaced with direct descriptions of what the path
walks and Graph contribution each produce.

Also dropped 'first showcase in v1' and 'first ordered slate'
phrasings from the README — those are cart-positioning notes that
belong in the PR description, not in customer-facing prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace "pillar" with "reasoner" or "stage" to match the customer-
facing reasoner taxonomy and the language other multi-reasoner
templates (e.g. telco_network_recovery) use. Remove the "Sibling CSP
templates" comparison and the PyRel-roadmap reference -- both are
internal positioning that doesn't help the reader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mponent)

Same gating pattern used for the predictive (GNN) templates: the
paths-library template stays in the private docs site while the
underlying capability stabilises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cafzal cafzal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship with nits. Pre-solve assertion (lines 694-846) is the best in v1 — per-IC necessary conditions with actionable per-user error lists. K+1 sentinel verified by construction (every count IC gates slot <= K; objective uses (K+1-slot) so K+1 contributes 0 naturally). Aggregate densification (| 0) consistent. Fetch script is robust: atomic writes, JSON validation w/ cache invalidation, polite UA + retry, frozen REFERENCE_YEAR for determinism. Customization section names real off-domain retargets (e-commerce, courses, news).

Issues

  • IMPORTANTdata/authors.csv carries Open Library noise that surfaces in user-visible output (lines 5 "TC", 12 "Aurora Irvine", 42 "Bible", 51 "Alex Goody", 59 "Booking", 4 "Les éditions du Rey" — edition/publisher records mis-classified as authors). Author names appear in per-pick explanations. Suggest filtering in data/fetch_open_library_slice.py:301-323: drop authors with names <3 chars, all-caps without punctuation, or matching a publisher denylist; fall back to dropping the work if its sole author is filtered.
  • NITdata/subjects.csv has near-duplicates that dilute the diversity dial (rows 4/5/6/8 are flavors of "adventure"; row 11 is the literal Dewey "823/.8"). With MAX_PER_SUBJECT=2 these are seen as distinct subjects. Normalize in fetch_open_library_slice.py:404-411 (strip Dewey codes, collapse "adventure*" variants).
  • NITREADME.md:3 description is 447 chars / one stream-of-consciousness sentence with five em-dash clauses. Trim to ~150 chars, business framing first then a colon to the technique tagline.
  • NITREADME.md:151-194 ("Expected output") mixes "what you'll see" with "how to scale up". Move scaling guidance into the existing "Scaling the bundled data" section at line 422.
  • NITbook_slate_recommendation.py:127-132 mixed CSV-var naming (users_csv, books_csv, then read_csv_data, ba_csv, bs_csv, bsim_csv). Make consistent.

py_compile and ruff check clean.

- Add publisher/imprint and Dewey-code filters in fetch_open_library_slice.py
  so authors.csv excludes corporate / single-token noise (TC, Bible, Booking,
  "Les éditions du Rey", ...) and subjects.csv collapses "adventure" variants
  and drops Dewey codes ("823/.8"). Regenerate the bundled sm slice.
- Trim the front-matter description to a business framing + technique tagline
  and move "scaling the bundled data" guidance out of the Quickstart's
  expected-output step into the dedicated section.
- Rename CSV-load locals (read_csv_data/ba_csv/bs_csv/bsim_csv) to a
  consistent <name>_csv pattern.
chriscoey and others added 2 commits May 8, 2026 12:47
- fetch_open_library_slice.py: replace the regex-chain normalization
  in _normalize_subject with an explicit canonical-tag map (drop /
  strip / merge phases). Variants like "action & adventure" now
  actually fold to "adventure" (the prior chain claimed to but did
  not). Collapse internal whitespace before lookup so multi-space
  variants normalize identically. Catch json.JSONDecodeError in
  _http_get_json so HTML rate-limit pages returned with HTTP 200
  trigger the retry loop instead of escaping it.
- README.md: rewrite the front-matter description in plain language;
  correct the bundled-data counts to 59 books / 52 authors.
- Regenerate the bundled sm slice with the new normalization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Module docstrings (book_slate_recommendation.py, fetch_open_library_slice.py):
  correct the bundled-data count to 59 books / 52 authors.
- _is_publisher_or_noise_author docstring: list the cascade in the
  same order the implementation runs (length -> denylist -> token-set).
- Runner docstring: clarify that the per-(user, candidate) explanation
  evidence feeds the Prescriptive reasoner; the Graph reasoner runs
  separately on the similarity graph for triangle_count.
- _SUBJECT_CANONICAL_MAP comment: record why the merge map is kept
  narrow -- aggressive genre-merging makes the shared-subject
  similarity graph dense enough that the MiniZinc CSP can't reach
  OPTIMAL within the time budget on the bundled slice.
- problem.solve: bump time_limit_sec from 60 to 180. The bundled
  instance solves to OPTIMAL well within 180s; the wider budget
  gives margin so the runner doesn't error out on slightly slower
  cloud queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t numbers

- README: update the two stale `time_limit_sec=60` references in
  the scaling and troubleshooting sections to match the runner's
  current `time_limit_sec=180`.
- fetch_open_library_slice.py: update the Usage block to say
  `~59 books` (the actual count after the publisher-noise filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p comment

Trim the _SUBJECT_CANONICAL_MAP block comment to a neutral one-liner that
describes what the map does. The previous version named specific genre
families and described iteration history that doesn't belong in a
public-facing template.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants