Add book_slate_recommendation template (Graph + Prescriptive CSP)#59
Open
Add book_slate_recommendation template (Graph + Prescriptive CSP)#59
Conversation
Three-pillar Graph + Paths + Prescriptive (MIP) template. Sketch state -- not yet validated end-to-end. Next step is an E2E run against a live RAI account to debug the pipeline. Pipeline: - Graph: PageRank over Movie.similar_to graph -> structural prior. - Paths: bounded explanation-path enumeration (<=3 hops) -> path-counts-by-type as integer features. - Prescriptive (HiGHS MIP): K-item slate per user under genre / director / actor diversity, freshness floor, originals exposure floor, cold-start cap, explanation-path floor; objective blends PageRank prior with path signal. Lead dataset: small hand-crafted sample modelled on the MovieLens-1M-KG schema (KGAT distribution). README points at the KGAT distribution for the realistic-instance build. Production precedent: Pinterest Pixie (random-walk recsys at production scale); Alibaba iGraph, eBay KPRN, LinkedIn Career Explorer, GE Healthcare KARE; regulatory drivers (GDPR Art. 22, EU AI Act Art. 86, ECJ C-203/22). Plan: ~/plans/csp-templates-coverage-epic.md "kg_aware_slate_recommendation" deferred-candidate entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Working pipeline: status OPTIMAL, objective 9779.76, all 9 ICs verified on the bundled sample. Watched-exclusion confirmed (user 3 correctly skipped la_la_land which they had already watched). Changes vs. the initial sketch: - Item supertype + Item.connected_to(Item, Item) unified edge. Workaround for the v1.1.0 paths-lib gap on multi-edge path() (paths/README.md "Currently unsupported patterns" §1, design epic RAI-44166). Preserves real heterogeneous KG bounded walks (User -> Movie -> Director -> Movie ...) instead of falling back to a Movie.similar_to-only walk. - PageRank stays Float (HiGHS handles float coefficients on binary decisions natively, same pattern as supply_chain). Dropped the (score * SCALE).cast(Integer) rescale that doesn't lift in 1.1.0. - Problem(model, Float); Candidate.pick is a Float-typed bin var to match supply_chain's binary-on-Float-problem pattern. - Watched-exclusion via a pick == 0 IC at the prescriptive layer (negation in rules not yet supported in 1.1.0; same gap compliance_rule_audit documents). - User.watched ingest uses a named Integer.ref() for rating to avoid the unground-variable typer error. - Slate size K reduced from 8 to 3 to fit the bundled 24-movie / 4-user sample (production K is 8-12 with MovieLens-1M-KG). - Renamed path_count_via_similar -> path_count_via_kg_walk to reflect that this count is now the heterogeneous KG-walk count, not a similar_to-only count. - README pipeline section updated to document the Item supertype + Item.connected_to layer and call out the v1.1.0 gap workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the bundled dataset and template domain from synthetic movies to a deterministic slice of Open Library (~60 books, ~58 authors, 12 subjects). Open Library publishes its bibliographic catalogue under CC0, so the template ships in full without licensing exposure (MovieLens / Goodreads / Amazon-Book all carry non-commercial clauses incompatible with shippable customer templates). - Add data/fetch_open_library_slice.py: deterministic, cached fetcher with sm/md/lg size profiles. Pulls works + authors + subjects from the public Open Library API and emits the 10-CSV bundle. Synthetic users / read events / similar_to edges are generated on top. - Rename Movie -> Book, Director -> Author, Genre -> Subject. Drop the Actor concept; the four-type heterogeneous KG (User, Book, Author, Subject) is plenty for KG-walks story. - Apply the documented `| 0` default-value pattern to every count(...).per(c) expression so path_count_via_author / _via_subject / _via_kg_walk are defined for *every* Candidate, not just those with at least one match. Without this, the composite path_count_total = via_a + via_s + via_kg drops any Candidate missing one operand, which collapses the explanation-floor MIP constraint to zero and renders feasible problems infeasible. Confirmed via problem.display(): every user now has a richly-grounded explanation IC (~30 terms / user) vs the prior 0-1 terms. - Final E2E: status OPTIMAL, objective 9971.95, all 8 ICs verified (slate-size, exclude-read, subject-diversity, author-uniqueness, freshness, originals, cold-start cap, explanation floor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-dedup pitfall Add experiments/count_variants.py + README probing six formulations of the per-(user, candidate) typed path-count features. Confirms (with problem.display) that the production form (variant A: three counts each | 0, arithmetic sum) is correct and that sum(model.union(propA, propB, propC)) silently undercounts under value collisions because union deduplicates on projected values. No prescriptive ≠ pyrel divergence found: PR #1117 / #1118 / #1213 stack already pins all observed behaviors via the iff suite (u_same_prop pins the dedup spec; empty_body_semantics pins the cascade-drop; arithmetic_filtered pins scope isolation across sibling aggregates). Inline comment in the production file points future readers to the experimental harness so the choice of arithmetic over sum-of-union is discoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address findings from a multi-round review of the template: - Utility now blends pagerank with path_count_total (was 2*via_author + via_subject, dropped via_kg_walk despite docs calling it the headline paths-pillar signal). - Subject-distribution inspection rewritten to use aggs.sum().per(User, Subject) -- prior form was Cartesian over Candidate × Subject (~13740 rows on sm; now 25 × 12 = 300). - Honest description of MAX_HOPS=2 walker reach in module docstring, README pillar 2, and inline comment: User -> read_Book (length 1) + User -> read_Book -> similar_Book (length 2). Per-typed counts (via_author, via_subject) clarified as direct shared-entity joins, not path walks. - README frontmatter brought in line with other v1 templates (quoted industry, reasoning_types block, Title-Case tags). - Data-precondition comment block before slate_size_ic explaining the joint-feasibility requirements (cold users, over-read users, books missing author/subject, fresh/in-house floors). - Fetcher emits WARNING summary lines when author-name resolve falls back to OL key tail, and when first_publish_date is synthesised. - current_year hardcode replaced with date.today().year. - Stale "10-CSV bundle" -> "8-CSV", "~30-40 authors" -> "~58 authors", "two+ subjects" -> "at least one shared subject". - README + fetcher now correctly describe similar_to as derived (not synthetic) -- the GDPR Art.22 explainability framing requires the graph to be evidence, not fabrication. - Removed RAI Jira IDs (RAI-44166), internal pytest node ids, and references to nonexistent sibling templates from customer-facing prose. Cleaned the same class of leak from experiments/README.md and experiments/count_variants.py. - count_variants module docstring now lists six variants (was five but defined six); added entry for variant F. - Fetcher error message preserves original error class for diagnosis. - FRESH_WINDOW_DAYS comment expanded to explain catalogue-vs-streaming tuning. EXPLANATION_FLOOR / WEAK_EXPLANATION_THRESHOLD scaling note added to README customise section for md/lg slices. E2E status: OPTIMAL on bundled sm slice; objective ~15619 (was ~9779 before utility wired path_count_total in -- expected to rise). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fixes - Module docstring + L207 unified-edge comment now describe per-typed counts as the explanation surface (not "top-aggregate-relevance path"); explicitly state typed-evidence joins are direct shared- entity joins, not per-hop edge introspection. - Fetcher: REFERENCE_YEAR = 2026 frozen constant for deterministic age_days across calendar years (cached reruns now produce identical CSVs in any year). - Fetcher: drop works with no resolvable authors (with WARNING) so the runner's author-coverage precondition holds. - README: regulatory section softened from "are required" to transparency-obligation framing with compliance-team caveat. - README intro + Pipeline summary: separate reach (walker generates candidates) from evidence (typed-joins score them); no more "two reach signals" ambiguity. - README + runner: LLM-explanation Customise bullet now references per-typed counts plus a documented small extension to materialise top paths; "eliminating hallucination" softened to "reducing hallucination risk". - experiments/README.md reframed as engineering notes useful to advanced customers; runner comment removed internal pick_5_12 debug coefficient name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dation
The previous name stacked two technical qualifiers ("kg_aware",
"slate"). The new name anchors to the lead instance (Open Library
books) while keeping the K-items "slate" shape that distinguishes
this template from a single-best recommendation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename module-level data path to uppercase DATA_DIR to match the recent CSP cart templates (synthetic_eligibility_records, product_configurator, synthetic_order_lifecycle). - Rewrite Quickstart to the canonical 6-step shape (Download / venv / Install / Configure / Run / Expected output) and add a Solve result block keyed to the bundled --size sm slice. - Document experiments/ in the template structure block so the ZIP layout matches the README. - Replace remaining kg_aware_* docstring and User-Agent strings in data/ and experiments/ left over from the rename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…add troubleshooting - Section markers, Model() placement, docstring Run:/Output: blocks, per-CSV / per-concept comments, US spellings (matches recently merged v1 templates). - Drop misleading "PyRel doesn't support not in rules" rationale comment; cite the actual prescriptive-rewriter constraint and the Stock.is_non_representative cart precedent in portfolio_balancing. - Remove the GNN/Predictive customise mention (replaced with a generic "custom scoring signal" bullet) since the predictive reasoner is not yet public. - Add a Troubleshooting section to the README with INFEASIBLE diagnostic, slow-solve, and fetcher-network details blocks. Expand the runner's data-preconditions block to enumerate the cold-start + explanation-floor joint feasibility relation and the author-uniqueness x slate-size interaction. - Reorder inspection blocks so the chosen slate is printed second (after Users); diagnostic candidate-set and PageRank dumps move to the end. - Reference fixes: drop the GE Healthcare KARE bullet (wrong attribution and method); correct KPRN attribution (NUS / eBay / USTC); disambiguate Alibaba iGraph from AliCoCo; drop dead Alibaba blog URL; trim the Pinterest engagement claim to its publication-date framing; trim the LinkedIn precedent to the skills graph (no platform-total numbers). - Drop unused aggs alias; bare sum used uniformly for aggregates. - Annotate Book.in_house as Integer 0/1 in the README schema block; add demo-grade tuning rationale to the utility weights. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oc fixes - Add a pre-solve "Candidate count per user" inspect block so customers can spot users with zero candidates before infeasibility. Document the per-user IC anchoring caveat: each floor IC fires only for users with at least one matching Candidate row, so sparse customer data may pass vacuously rather than becoming infeasible. - Open Library fetcher: pad User-Agent with a contact-stub note (Open Library API guidance asks for contact info), bump inter-request sleep from 0.2s to 1.0s to honour the documented unidentified rate limit, make cache writes atomic via temp-file rename, and treat a JSONDecodeError on cache read as a miss (interrupted writes or stale error bodies no longer poison the cache forever). - Tighten the utility-weights comment to acknowledge that with the default scales, PageRank effectively serves as a tie-breaker rather than a co-equal blend, and point production deployments at min-max normalisation before applying these weights. - Correct the K-RagRec / ItemRAG citation in the README and the matching mention in the runner Customise block. - Sharpen the Troubleshooting INFEASIBLE recipe: lead with the new pre-solve diagnostic, note that problem.verify already prints per-IC violations on a failing solve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r fixups - Add a Python-level pre-solve assertion that materialises the Candidate set, anti-joins against User.read, and refuses to solve if any user has fewer than SLATE_SIZE_K unread candidates. The per-user floor ICs anchor on Candidate rows, so without this guard customers with sparse data get a silent missing-row contract violation rather than an explicit infeasibility signal. The check also prints unread counts per user so reach can be inspected pre-solve. - Fetcher cache temp file now embeds the process pid so concurrent fetcher runs against the same _cache directory don't race on a shared .tmp path. - Replace the duplicated K-RagRec entry in the LLM+KG hybrid-pattern list with GraphRAG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… in-house floors The Candidate-anchored per-user floor ICs cover three thresholds: SLATE_SIZE_K (cardinality), FRESHNESS_FLOOR (fresh items), and ORIGINALS_FLOOR (in-house items). Without per-floor pre-solve checks, a user whose unread candidates contain zero fresh items would still pass the SLATE_SIZE_K guard but produce a slate that silently violates the freshness floor (the where(...) filter on freshness_ic removes the user entirely from the IC's row set). Extend the assertion to also check per-user unread fresh count >= FRESHNESS_FLOOR and per-user unread in-house count >= ORIGINALS_FLOOR; surface affected users in the error message. Tighten the README troubleshooting block to describe the assertion accurately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move CSV loads to top of file and add pre-solve invariants (unique-key, no-dangling-FK, non-negative age_days) using the _assert_* helpers + declarative FK-edges table that patient_cohort_recruitment establishes for v1 - Drop "Background and precedent" deep-dive from README; keep a short "Where this fits" framing - Trim the academic References block to just Open Library - Pin pandas>=2.0 to match other v1 templates - Drop reasoning_types: Paths in favor of canonical [Graph, Prescriptive] vocabulary; description re-framed accordingly - Compress essay comments around tuning constants, the path walker, the Item.connected_to relationship, and the Data-preconditions block (joint-feasibility detail already lives in README Troubleshooting) - Drop the bottom "Customize-section variants" comment block (it duplicates the README "Customize this template" section) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three remaining cross-references still spelled "Paths" / "three-pillar" after the front-matter switched to [Graph, Prescriptive]: - Module docstring header and "Three-pillar pipeline" line - "Pillar 2: Paths" section header in the .py - "What's included" line + "Pipeline" step 2 header in the README - v1 index description Re-frame as "Multi-reasoner" and "Graph (bounded KG walks)" so the labels match the canonical reasoning_types vocabulary throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The docs preview for this pull request has been deployed to Vercel!
|
…gle-count embeddedness floor The old shape framed Graph + Paths + Prescriptive as three coequal pillars but PageRank was just a per-Book Float that any retrieval- stage scalar could substitute. That made the Graph contribution swappable and effectively optional, and the "Paths" pillar read as a sidecar to PageRank rather than the centerpiece. Restructure so Paths visibly leads: - Reorder the .py: Pillar 1 = Paths (Candidate concept + per-typed counts), Pillar 2 = Graph, Pillar 3 = Prescriptive. Reorder the README "Pipeline" section to match. - Replace PageRank with `Graph.triangle_count()` per Book. Triangle count is a topological measure of where each Book sits in the similarity neighborhood; it cannot be supplied externally without reconstructing the graph, which is what makes it a Graph-pillar contribution rather than a data-layer input. - Add `embeddedness_ic`: at least EMBEDDEDNESS_FLOOR picks per user must have triangle_count >= EMBEDDEDNESS_THRESHOLD. The Graph pillar now drives a structural-diversity *constraint*, not just an objective term. - Drop the utility blend and the PAGERANK_WEIGHT / PATH_SIGNAL_WEIGHT constants. Objective collapses to `sum(path_count_total * pick)` -- pure path-driven, integer-only. - Update README "Why MIP, not CSP" rationale (no longer about float coefficients), the "Customize this template" "Custom scoring signal" bullet (now framed as adding an *additive* Float term, not swapping out PageRank), the troubleshooting section (added the embeddedness-floor infeasibility cause), the constants list, and the v1 index entry to match. Three Slack signals motivated picking triangle_count over Louvain (which would have been the cleaner community-diversity story): recent louvain() test failures locally, an open question about re-implementing Louvain via loops in PyRel, and ambiguity about its deprecation timeline. WCC was the other option but the bundled similarity graph is one connected component (60/60), so a "slate spans >= N components" IC is trivially satisfied or trivially infeasible. Triangle count has a real per-book distribution (0-107, two isolates, varied mid-tail) on the bundled slice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/path story
- Switch solver from HiGHS MIP to MiniZinc CSP. Pure-integer model:
binary picks, integer coefficients, no float blend.
Problem(model, Integer) and solve("minizinc", ...).
- Update reasoning_types tag and template description to reflect CSP.
- Strengthen the graph/path narrative through three new ICs and
a derived property:
* subject_span_ic: each user's slate must touch
>= MIN_DISTINCT_SUBJECTS distinct subjects, expressed via
count(Subject, Candidate.pick == 1) -- distinct-value counting
that sum-of-indicators cannot express directly.
* Candidate.primary_evidence: derived integer property
(1=author, 2=subject, 3=walker) from argmax of the three typed
path counts. Three mutually exclusive define rules.
* path_evidence_diversity_ic: each slate must touch >= MIN_EVIDENCE_TYPES
distinct primary-evidence types, expressed via
count(Integer.ref(), Candidate.pick == 1) over distinct
primary_evidence values. CSP-native distinct-value counting.
* strong_walker_ic: at least MIN_STRONG_WALKERS picks must have
path_count_via_kg_walk >= STRONG_WALKER_THRESHOLD. Anchors at
least one pick to the headline Paths-pillar signal rather than
the cheaper shared-author / shared-subject joins.
- Drop subject_diversity_ic (subsumed by subject_span_ic + cardinality).
- Inspect output now also surfaces primary_evidence per picked item.
README, troubleshooting, and v1 index updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent-decay objective
Switch the prescriptive layer from binary pick to multi-valued integer
slot in {1..K, K+1} (K+1 = unpicked sentinel). Slot order matches the
canonical recsys position-decay engagement model -- top-of-row picks
dominate impressions -- and lets the objective directly weight items
by position via sum((K+1-slot) * path_count_total).
Pillar-2 (Graph) contribution moves from a "somewhere in the slate"
embeddedness floor to a hero-slot pin: slot 1 must come from a Book
whose triangle count clears HERO_EMBEDDEDNESS_THRESHOLD, concentrating
the structural-quality signal at the highest-engagement position.
Drop the demo-shaped extras (primary_evidence Integer property and
its 3 mutually-exclusive define rules, path_evidence_diversity_ic,
strong_walker_ic). They were added to showcase count-distinct CSP
syntax, but PyRel's prescriptive rewriter does not currently support
distinct aggregates in IC compilation, and the underlying constraints
weren't real product rules. The CSP idioms that DO exercise here:
multi-valued integer decisions, GCC-style per-pair count caps (slot,
author, subject), slot-equality reification (hero pin), and the
reified domain rule for the already-read exclusion.
Replace author_diversity_ic / subject_span_ic count-of-Concept forms
with per-pair count caps (count(Candidate, slot<=K).per(user, X) <=
N) -- the shape PyRel actually compiles. Add slot_uniqueness_ic
(count Candidates per (user, Slot.pos) <= 1) to enforce that each
slate position is filled exactly once; combined with slate_size_ic
this is a bijection between picks and positions 1..K.
Strengthen the pre-solve assertion to also flag users with no
hero-eligible candidate or fewer than K distinct unread authors.
E2E (sm slice, MiniZinc): OPTIMAL, objective 648, all ten ICs
verify clean, slate ordered 1..K per user.
…ighten docs; drop experiments dir
Pre-solve guard now catches the three remaining IC infeasibility modes
that previously surfaced as silent INFEASIBLE solves:
- cold_start: users with fewer than K - COLD_START_CAP strongly-explained candidates
- subject_span: users whose unread pool spans fewer than ceil(K / MAX_PER_SUBJECT) subjects
- explanation: users whose top-K position-weighted score upper bound is below EXPLANATION_FLOOR
ValueError repair hint is now keyed by which condition fired (densify
reach vs lower a per-IC floor vs lower SLATE_SIZE_K vs lower
EXPLANATION_FLOOR), replacing the generic "densify Book.similar_to"
hint that was misleading for short-author shortfalls.
Data-domain validations:
- in_house must be in {0, 1}; values outside silently disqualify books from the originals pool
- book_similar (src_book_id, dst_book_id) must be unique; duplicates would inflate triangle counts
Doc tightening:
- New "How this template differs from other CSP templates" README section names
the three architectural choices that follow from encoding an ordered slate
(unified super-edge, K+1 sentinel, per-pair count caps + per-user existential hero pin)
- Sections reordered: differences before count-idioms note (architectural orientation
precedes the implementation caveat)
- Version-neutralized "paths-lib limitation" wording at 4 sites (was pinned to v1.1.0)
- Corrected via-author/via-subject path-count comments: counts are bag-style
(one per join row), not distinct
- Fixed objective wording so "lower slot indices = top of row = hero" lands
unambiguously
- Updated solve-time claim from ~1 minute to "a few seconds" for the bundled slice
- Quickstart now mentions re-running the solver after fetching a larger slice
Drop experiments/ directory (engineering scratch, was the only v1/*/experiments
across all templates; load-bearing insight retained in README).
Regenerated v1/README.md index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Singular form aligns with the other reasoning_types entries (Graph, Prescriptive, Predictive, Rules-based) and with the customer-facing class/method names (PathTraversal, model.path()). The plural-form KG-Paths tag stays since it refers to graph paths as a data structure rather than to the reasoner. Updates: front-matter reasoning_types, description string, README prose pillar headers, script docstring and pillar comment, v1 index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r-pillar framing Per the customer-facing reasoner taxonomy, Path is taxonomically subordinate to Graph (treated as a subset, not a peer of Graph / Predictive / Prescriptive / Rules-based). Adjusted framing throughout so the paths library is positioned as a load-bearing technique rather than a separate reasoner pillar: - Front-matter reasoning_types narrowed to [Graph, Prescriptive]. - Description reframed as "Graph + Prescriptive (CSP) recsys template ... bounded knowledge-graph walks via the paths library generate the candidate set". - Pillar bullets in README + script docstring renamed: bounded KG walks (paths library, central) is the architectural centerpiece; Graph reasoner and Prescriptive reasoner are the two reasoner pillars. Code section markers renumbered Pillar 1 = Graph, Pillar 2 = Prescriptive. - KG-Paths tag retained (it refers to graph paths as a data structure, not the reasoner taxonomy). - v1 README index regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-internal 'first showcase' positioning from customer-facing prose The 'Removing X collapses Y' rhetorical construction in README and docstring added no information beyond what the surrounding bullets already convey. Replaced with direct descriptions of what the path walks and Graph contribution each produce. Also dropped 'first showcase in v1' and 'first ordered slate' phrasings from the README — those are cart-positioning notes that belong in the PR description, not in customer-facing prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace "pillar" with "reasoner" or "stage" to match the customer- facing reasoner taxonomy and the language other multi-reasoner templates (e.g. telco_network_recovery) use. Remove the "Sibling CSP templates" comparison and the PyRel-roadmap reference -- both are internal positioning that doesn't help the reader. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mponent) Same gating pattern used for the predictive (GNN) templates: the paths-library template stays in the private docs site while the underlying capability stabilises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cafzal
approved these changes
May 8, 2026
Collaborator
cafzal
left a comment
There was a problem hiding this comment.
Ship with nits. Pre-solve assertion (lines 694-846) is the best in v1 — per-IC necessary conditions with actionable per-user error lists. K+1 sentinel verified by construction (every count IC gates slot <= K; objective uses (K+1-slot) so K+1 contributes 0 naturally). Aggregate densification (| 0) consistent. Fetch script is robust: atomic writes, JSON validation w/ cache invalidation, polite UA + retry, frozen REFERENCE_YEAR for determinism. Customization section names real off-domain retargets (e-commerce, courses, news).
Issues
- IMPORTANT —
data/authors.csvcarries Open Library noise that surfaces in user-visible output (lines 5 "TC", 12 "Aurora Irvine", 42 "Bible", 51 "Alex Goody", 59 "Booking", 4 "Les éditions du Rey" — edition/publisher records mis-classified as authors). Author names appear in per-pick explanations. Suggest filtering indata/fetch_open_library_slice.py:301-323: drop authors with names <3 chars, all-caps without punctuation, or matching a publisher denylist; fall back to dropping the work if its sole author is filtered. - NIT —
data/subjects.csvhas near-duplicates that dilute the diversity dial (rows 4/5/6/8 are flavors of "adventure"; row 11 is the literal Dewey "823/.8"). WithMAX_PER_SUBJECT=2these are seen as distinct subjects. Normalize infetch_open_library_slice.py:404-411(strip Dewey codes, collapse "adventure*" variants). - NIT —
README.md:3description is 447 chars / one stream-of-consciousness sentence with five em-dash clauses. Trim to ~150 chars, business framing first then a colon to the technique tagline. - NIT —
README.md:151-194("Expected output") mixes "what you'll see" with "how to scale up". Move scaling guidance into the existing "Scaling the bundled data" section at line 422. - NIT —
book_slate_recommendation.py:127-132mixed CSV-var naming (users_csv,books_csv, thenread_csv_data,ba_csv,bs_csv,bsim_csv). Make consistent.
py_compile and ruff check clean.
- Add publisher/imprint and Dewey-code filters in fetch_open_library_slice.py
so authors.csv excludes corporate / single-token noise (TC, Bible, Booking,
"Les éditions du Rey", ...) and subjects.csv collapses "adventure" variants
and drops Dewey codes ("823/.8"). Regenerate the bundled sm slice.
- Trim the front-matter description to a business framing + technique tagline
and move "scaling the bundled data" guidance out of the Quickstart's
expected-output step into the dedicated section.
- Rename CSV-load locals (read_csv_data/ba_csv/bs_csv/bsim_csv) to a
consistent <name>_csv pattern.
- fetch_open_library_slice.py: replace the regex-chain normalization in _normalize_subject with an explicit canonical-tag map (drop / strip / merge phases). Variants like "action & adventure" now actually fold to "adventure" (the prior chain claimed to but did not). Collapse internal whitespace before lookup so multi-space variants normalize identically. Catch json.JSONDecodeError in _http_get_json so HTML rate-limit pages returned with HTTP 200 trigger the retry loop instead of escaping it. - README.md: rewrite the front-matter description in plain language; correct the bundled-data counts to 59 books / 52 authors. - Regenerate the bundled sm slice with the new normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Module docstrings (book_slate_recommendation.py, fetch_open_library_slice.py): correct the bundled-data count to 59 books / 52 authors. - _is_publisher_or_noise_author docstring: list the cascade in the same order the implementation runs (length -> denylist -> token-set). - Runner docstring: clarify that the per-(user, candidate) explanation evidence feeds the Prescriptive reasoner; the Graph reasoner runs separately on the similarity graph for triangle_count. - _SUBJECT_CANONICAL_MAP comment: record why the merge map is kept narrow -- aggressive genre-merging makes the shared-subject similarity graph dense enough that the MiniZinc CSP can't reach OPTIMAL within the time budget on the bundled slice. - problem.solve: bump time_limit_sec from 60 to 180. The bundled instance solves to OPTIMAL well within 180s; the wider budget gives margin so the runner doesn't error out on slightly slower cloud queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t numbers - README: update the two stale `time_limit_sec=60` references in the scaling and troubleshooting sections to match the runner's current `time_limit_sec=180`. - fetch_open_library_slice.py: update the Usage block to say `~59 books` (the actual count after the publisher-noise filter). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p comment Trim the _SUBJECT_CANONICAL_MAP block comment to a neutral one-liner that describes what the map does. The previous version named specific genre families and described iteration history that doesn't belong in a public-facing template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this template adds
A Graph + Prescriptive (CSP) recsys template that picks K books per reader from a heterogeneous knowledge graph and orders them by slate position. Slot 1 is the hero (top of row, highest engagement); position-decay is the canonical recsys engagement model, so the order matters as much as the selection. Constraints: cardinality, slot uniqueness, already-read exclusion, author uniqueness, subject concentration cap, freshness floor, in-house exposure floor, cold-start cap, hero pin, and explanation-path floor (10 ICs). Objective maximizes
sum((K + 1 - slot) * path_count_total)-- the canonical position-decay engagement model.Three things make this template distinctive:
Item.connected_to.repeat(1, MAX_HOPS).all_paths()(relationalai.semantics.std.paths) walks generate the Candidate concept and its per-(user, candidate)typed-evidence counts;Graph.triangle_count()over the book-similarity graph drives the slot-1 hero pin to a structurally-central pick.Candidate.slot ∈ {1, ..., K, K+1}where K+1 is the unpicked sentinel, so the position weight(K+1 - slot)is 0 at unpicked and no auxiliary picked-indicator is needed. The same encoding handles cardinality, position decay, and the per-pick explanation weighting in one decision variable.all_differentwould conflict with the shared K+1 sentinel.Modeling patterns this surfaces
Itemsuper-concept with typed sub-concepts (User/Book/Author/Subject), plus a single 2-arityItem.connected_tosuper-edge populated as the symmetric union of typed edges. The unified-edge layer is needed because apath()call walks one 2-arity relationship at a time.path_count_via_author,path_count_via_subject) sit alongside a true bounded-walk count (path_count_via_kg_walk). Three integer features, blended viapath_count_totalinto both IC clauses and the objective.count(...).per(c).where(...) | 0densification so every Candidate has every typed-evidence property defined (otherwise sum-over-pick aggregates silently undercount). Arithmetic sum (a + s + w), notsum(model.union(a, s, w))-- union inside an aggregate body deduplicates on projected values.(K+1-slot)evaluates to 0 at unpicked, K at the hero slot, and decays monotonically.count(distinct ...)is rejected by the prescriptive rewriter today; the per-pair cap form compiles to MiniZinc GCC propagation.User.read, and refuses to solve if any user fails any of the per-IC feasibility necessary conditions. The error message lists affected users per shortfall and a strategy block keyed by which condition fired -- sparse customer data hits a clearValueErrorrather than a quiet INFEASIBLE solve.--size sm|md|lgfetch script that caches underdata/_cache/, atomic on write, JSON-validated on read, and process-pid-tagged so concurrent runs don't race.Privacy
Marked
private: trueso it ships only on the private docs site for now -- same gating pattern used for the predictive (GNN) templates while the paths library matures.Verification
relationalai==1.1.0, MiniZinc backend: statusOPTIMAL, objective648, num_points 1;problem.verify()re-evaluates all 10 ICs in the returned solution clean.in_house ∈ {0, 1}domain check, non-negativeage_days).References
Eksombatchai et al., Pixie (WWW 2018); Wang et al., KGAT (KDD 2019); Wang et al., KPRN (AAAI 2019); Xian et al., PGPR (SIGIR 2019); Ying et al., PinSage (KDD 2018); Wang et al., K-RagRec (ACL 2025).