Enhanced Search Results by artiehumphreys · Pull Request #1246 · thecourseforum/theCourseForum2

artiehumphreys · 2026-04-13T15:12:11Z

Problem:

The search results are often polluted with irrelevant courses that either haven't been taught in a while (e.g. https://thecourseforum.com/search/?mode=courses&q=computer) or aren't popular among the students. This applies particularly for broader searches, such as those like "artificial intelligence" where the plethora of special topics courses related to the matter aren't displayed in the search results and instead there are many graduate law classes that only a small fraction of users will ever click.

What this PR solves:

Weighs search results on trigram similarity (as before) and also recency and number of reviews
Adjustable weights

Summary by CodeRabbit

New Features
- Search results now use a weighted ranking that combines relevance, review popularity (normalized, with an upper cap) and recency of offering to surface more relevant courses.
- Results still apply a minimum relevance threshold to filter out low-quality matches.
- Improved ordering produces more useful top results reflecting student interest and recent offerings.

coderabbitai · 2026-04-13T15:12:36Z

📝 Walkthrough

Walkthrough

The fetch_courses() function now ranks search results using a weighted composite score combining similarity, review popularity, and recency, instead of ordering solely by similarity. New database annotations calculate review count, normalized popularity, and recency metrics, with results ordered by the composite score.

Changes

Cohort / File(s)	Summary
Course Search Ranking Enhancement `tcf_website/views/catalog/search.py`	Modified `fetch_courses()` to implement a weighted composite score. Added ORM annotations for `review_count`, `review_popularity` (capped normalization), and `recency_score`, introduced `_SIMILARITY_WEIGHT`, `_REVIEW_POPULARITY_WEIGHT`, and `_RECENCY_WEIGHT` constants, and changed ordering from `max_similarity` to `-weighted_score`. Also updated the function docstring.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through rows of scores and names,
Weighed similarity, reviews, and recent aims.
A tiny rabbit tallying each course's light,
Balancing numbers by moon and byte.
Hop on — rankings now feel just right ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description addresses the Problem and Solution but significantly deviates from the provided template structure, missing key sections like GitHub Issues, Screenshots, and Testing.	Reformat the description to match the repository template by adding GitHub Issues addressed, Screenshots (Before/After), and Testing sections to provide complete context for reviewers.
Title check	❓ Inconclusive	The title 'Enhanced Search Results' is partially related to the changeset but lacks specificity about the core improvement (weighted ranking combining similarity, recency, and reviews).	Consider a more descriptive title such as 'Improve search ranking with weighted composite scoring' or 'Add recency and popularity weighting to course search results' to better convey the specific technical improvement.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch enhanced-search-results

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 89-97: The query currently orders only by the rounded alias
weighted_score (Round(..., 2)), which causes many ties and unstable pagination;
keep the full-precision score expression (the unrounded
Value(_SIMILARITY_WEIGHT) * F("max_similarity") +
Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") +
Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it
like full_score) and use it first in order_by, still expose the rounded
weighted_score for display if needed, then add deterministic tie-breakers such
as a unique PK or created/updated timestamp (e.g., order_by("-full_score",
"-weighted_score", "pk")) so pagination is stable and reproducible.
- Around line 53-54: Semester.latest() may return None so dereferencing .number
causes a 500; update the logic around latest_sem_number in the search view to
first capture latest = Semester.latest() and guard it (e.g., if latest is None
set latest_sem_number = 0.0 or choose a sensible default/skipped recency weight)
before converting to float and using it in the weighted ordering; change the
code that currently calls float(Semester.latest().number) to use this guarded
variable (reference symbols: Semester.latest(), latest_sem_number).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9bcb8763-4778-4634-bf46-429d986b4a7a

📥 Commits

Reviewing files that changed from the base of the PR and between 2fc0bf7 and 3104589.

📒 Files selected for processing (1)

tcf_website/views/catalog/search.py

coderabbitai · 2026-04-13T15:17:14Z

+            weighted_score=Round(
+                Value(_SIMILARITY_WEIGHT) * F("max_similarity")
+                + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
+                + Value(_RECENCY_WEIGHT) * F("recency_score"),
+                2,
+            ),
        )
        .filter(max_similarity__gte=_SIMILARITY_THRESHOLD)
-        .order_by("-max_similarity")
+        .order_by("-weighted_score")


⚠️ Potential issue | 🟠 Major

Use a stable sort key for paginated results.

weighted_score is rounded to 2 decimals and then used as the only ORDER BY. For broad queries, that will create a lot of ties, so page 1/page 2 can reshuffle or repeat courses between requests. Keep the full-precision score for ranking and add deterministic tie-breakers.

Suggested fix

.annotate( - weighted_score=Round( - Value(_SIMILARITY_WEIGHT) * F("max_similarity") - + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") - + Value(_RECENCY_WEIGHT) * F("recency_score"), - 2, - ), + weighted_score=( + Value(_SIMILARITY_WEIGHT) * F("max_similarity") + + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") + + Value(_RECENCY_WEIGHT) * F("recency_score") + ), ) .filter(max_similarity__gte=_SIMILARITY_THRESHOLD) - .order_by("-weighted_score") + .order_by( + "-weighted_score", + "-max_similarity", + "-review_count", + "-semester_last_taught__number", + "pk", + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tcf_website/views/catalog/search.py` around lines 89 - 97, The query currently orders only by the rounded alias weighted_score (Round(..., 2)), which causes many ties and unstable pagination; keep the full-precision score expression (the unrounded Value(_SIMILARITY_WEIGHT) * F("max_similarity") + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") + Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it like full_score) and use it first in order_by, still expose the rounded weighted_score for display if needed, then add deterministic tie-breakers such as a unique PK or created/updated timestamp (e.g., order_by("-full_score", "-weighted_score", "pk")) so pagination is stable and reproducible.

gyoge0 · 2026-04-13T15:18:13Z

+                    50.0
+                ),  # anything last taught 5 years ago or later has a score of 0
+                output_field=FloatField(),
+            ),


are we able to suppress the ruff line length here?

Jay-Lalwani · 2026-04-13T15:45:15Z

How does this affect performance?

review_count=Count("review", distinct=True) looks very expensive.

artiehumphreys · 2026-04-13T16:48:50Z

How does this affect performance?

review_count=Count("review", distinct=True) looks very expensive.

The affect on performance is minimal as each search already fetches review data (LEFT INNER JOIN in the picture below) in each call to Course.with_stats() which is called by browseable_course_queryset. This means that fetching the number of reviews is mostly trivial. Below are results of cold searches on both dev and enhanced-search-results` for a simple query

`enhanced-search-results`:

`dev`

Naturally, there is a little more overhead but its relatively small compared to the overhead of the Trigram on every course.

coderabbitai

🧹 Nitpick comments (1)

tcf_website/views/catalog/search.py (1)
64-65: Review count includes all reviews regardless of visibility.

Count("review", distinct=True) counts all reviews linked to each course, including any that may be hidden or flagged as toxic. If the intent is to reflect publicly visible engagement, you may want to filter to visible reviews only. However, if using total review count as a raw popularity signal is intentional, this is fine as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_website/views/catalog/search.py` around lines 64 - 65, The current
annotation review_count = Count("review", distinct=True) tallies all linked
reviews including hidden/toxic ones; update the annotation in search.py to use a
filtered Count that only includes publicly visible reviews (e.g., use the review
model's visibility flag such as is_visible/is_public and exclude flagged/toxic
reviews) so review_count reflects visible engagement, or alternatively add a
separate visible_review_count if you need both raw and public counts; locate the
Count call that defines review_count and replace it with a filtered aggregation
referencing the review visibility field.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 64-65: The current annotation review_count = Count("review",
distinct=True) tallies all linked reviews including hidden/toxic ones; update
the annotation in search.py to use a filtered Count that only includes publicly
visible reviews (e.g., use the review model's visibility flag such as
is_visible/is_public and exclude flagged/toxic reviews) so review_count reflects
visible engagement, or alternatively add a separate visible_review_count if you
need both raw and public counts; locate the Count call that defines review_count
and replace it with a filtered aggregation referencing the review visibility
field.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7fb0345b-ad26-4408-8886-8c37985f2795

📥 Commits

Reviewing files that changed from the base of the PR and between 3104589 and cb44485.

📒 Files selected for processing (1)

tcf_website/views/catalog/search.py

feat(search): search results weighted on recency and number of reviews

3104589

artiehumphreys requested review from Jay-Lalwani and gyoge0 April 13, 2026 15:12

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

gyoge0 previously approved these changes Apr 13, 2026

View reviewed changes

fix(search): improved weights -- less bias for popular courses

cb44485

artiehumphreys dismissed gyoge0’s stale review via cb44485 April 13, 2026 16:50

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Merge branch 'dev' into enhanced-search-results

f65e30e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced Search Results#1246

Enhanced Search Results#1246
artiehumphreys wants to merge 3 commits into
devfrom
enhanced-search-results

artiehumphreys commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

gyoge0 Apr 13, 2026

Uh oh!

Jay-Lalwani commented Apr 13, 2026 •

edited

Loading

Uh oh!

artiehumphreys commented Apr 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

artiehumphreys commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

What this PR solves:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gyoge0 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Jay-Lalwani commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artiehumphreys commented Apr 13, 2026

enhanced-search-results:

dev

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

artiehumphreys commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Jay-Lalwani commented Apr 13, 2026 •

edited

Loading

`enhanced-search-results`:

`dev`