Skip to content

Enhanced Search Results#1246

Open
artiehumphreys wants to merge 3 commits into
devfrom
enhanced-search-results
Open

Enhanced Search Results#1246
artiehumphreys wants to merge 3 commits into
devfrom
enhanced-search-results

Conversation

@artiehumphreys
Copy link
Copy Markdown
Member

@artiehumphreys artiehumphreys commented Apr 13, 2026

Problem:

The search results are often polluted with irrelevant courses that either haven't been taught in a while (e.g. https://thecourseforum.com/search/?mode=courses&q=computer) or aren't popular among the students. This applies particularly for broader searches, such as those like "artificial intelligence" where the plethora of special topics courses related to the matter aren't displayed in the search results and instead there are many graduate law classes that only a small fraction of users will ever click.

What this PR solves:

  • Weighs search results on trigram similarity (as before) and also recency and number of reviews
  • Adjustable weights

Summary by CodeRabbit

  • New Features
    • Search results now use a weighted ranking that combines relevance, review popularity (normalized, with an upper cap) and recency of offering to surface more relevant courses.
    • Results still apply a minimum relevance threshold to filter out low-quality matches.
    • Improved ordering produces more useful top results reflecting student interest and recent offerings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The fetch_courses() function now ranks search results using a weighted composite score combining similarity, review popularity, and recency, instead of ordering solely by similarity. New database annotations calculate review count, normalized popularity, and recency metrics, with results ordered by the composite score.

Changes

Cohort / File(s) Summary
Course Search Ranking Enhancement
tcf_website/views/catalog/search.py
Modified fetch_courses() to implement a weighted composite score. Added ORM annotations for review_count, review_popularity (capped normalization), and recency_score, introduced _SIMILARITY_WEIGHT, _REVIEW_POPULARITY_WEIGHT, and _RECENCY_WEIGHT constants, and changed ordering from max_similarity to -weighted_score. Also updated the function docstring.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through rows of scores and names,
Weighed similarity, reviews, and recent aims.
A tiny rabbit tallying each course's light,
Balancing numbers by moon and byte.
Hop on — rankings now feel just right ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description addresses the Problem and Solution but significantly deviates from the provided template structure, missing key sections like GitHub Issues, Screenshots, and Testing. Reformat the description to match the repository template by adding GitHub Issues addressed, Screenshots (Before/After), and Testing sections to provide complete context for reviewers.
Title check ❓ Inconclusive The title 'Enhanced Search Results' is partially related to the changeset but lacks specificity about the core improvement (weighted ranking combining similarity, recency, and reviews). Consider a more descriptive title such as 'Improve search ranking with weighted composite scoring' or 'Add recency and popularity weighting to course search results' to better convey the specific technical improvement.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch enhanced-search-results

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 89-97: The query currently orders only by the rounded alias
weighted_score (Round(..., 2)), which causes many ties and unstable pagination;
keep the full-precision score expression (the unrounded
Value(_SIMILARITY_WEIGHT) * F("max_similarity") +
Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") +
Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it
like full_score) and use it first in order_by, still expose the rounded
weighted_score for display if needed, then add deterministic tie-breakers such
as a unique PK or created/updated timestamp (e.g., order_by("-full_score",
"-weighted_score", "pk")) so pagination is stable and reproducible.
- Around line 53-54: Semester.latest() may return None so dereferencing .number
causes a 500; update the logic around latest_sem_number in the search view to
first capture latest = Semester.latest() and guard it (e.g., if latest is None
set latest_sem_number = 0.0 or choose a sensible default/skipped recency weight)
before converting to float and using it in the weighted ordering; change the
code that currently calls float(Semester.latest().number) to use this guarded
variable (reference symbols: Semester.latest(), latest_sem_number).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9bcb8763-4778-4634-bf46-429d986b4a7a

📥 Commits

Reviewing files that changed from the base of the PR and between 2fc0bf7 and 3104589.

📒 Files selected for processing (1)
  • tcf_website/views/catalog/search.py

Comment thread tcf_website/views/catalog/search.py
Comment on lines +89 to +97
weighted_score=Round(
Value(_SIMILARITY_WEIGHT) * F("max_similarity")
+ Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
+ Value(_RECENCY_WEIGHT) * F("recency_score"),
2,
),
)
.filter(max_similarity__gte=_SIMILARITY_THRESHOLD)
.order_by("-max_similarity")
.order_by("-weighted_score")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a stable sort key for paginated results.

weighted_score is rounded to 2 decimals and then used as the only ORDER BY. For broad queries, that will create a lot of ties, so page 1/page 2 can reshuffle or repeat courses between requests. Keep the full-precision score for ranking and add deterministic tie-breakers.

Suggested fix
         .annotate(
-            weighted_score=Round(
-                Value(_SIMILARITY_WEIGHT) * F("max_similarity")
-                + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
-                + Value(_RECENCY_WEIGHT) * F("recency_score"),
-                2,
-            ),
+            weighted_score=(
+                Value(_SIMILARITY_WEIGHT) * F("max_similarity")
+                + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
+                + Value(_RECENCY_WEIGHT) * F("recency_score")
+            ),
         )
         .filter(max_similarity__gte=_SIMILARITY_THRESHOLD)
-        .order_by("-weighted_score")
+        .order_by(
+            "-weighted_score",
+            "-max_similarity",
+            "-review_count",
+            "-semester_last_taught__number",
+            "pk",
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_website/views/catalog/search.py` around lines 89 - 97, The query
currently orders only by the rounded alias weighted_score (Round(..., 2)), which
causes many ties and unstable pagination; keep the full-precision score
expression (the unrounded Value(_SIMILARITY_WEIGHT) * F("max_similarity") +
Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") +
Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it
like full_score) and use it first in order_by, still expose the rounded
weighted_score for display if needed, then add deterministic tie-breakers such
as a unique PK or created/updated timestamp (e.g., order_by("-full_score",
"-weighted_score", "pk")) so pagination is stable and reproducible.

gyoge0
gyoge0 previously approved these changes Apr 13, 2026
50.0
), # anything last taught 5 years ago or later has a score of 0
output_field=FloatField(),
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to suppress the ruff line length here?

@Jay-Lalwani
Copy link
Copy Markdown
Collaborator

Jay-Lalwani commented Apr 13, 2026

How does this affect performance?

review_count=Count("review", distinct=True) looks very expensive.

@artiehumphreys
Copy link
Copy Markdown
Member Author

How does this affect performance?

review_count=Count("review", distinct=True) looks very expensive.

The affect on performance is minimal as each search already fetches review data (LEFT INNER JOIN in the picture below) in each call to Course.with_stats() which is called by browseable_course_queryset. This means that fetching the number of reviews is mostly trivial. Below are results of cold searches on both dev and enhanced-search-results` for a simple query

enhanced-search-results:

Screenshot 2026-04-13 at 12 37 27 PM

dev

Screenshot 2026-04-13 at 12 36 55 PM

Naturally, there is a little more overhead but its relatively small compared to the overhead of the Trigram on every course.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tcf_website/views/catalog/search.py (1)

64-65: Review count includes all reviews regardless of visibility.

Count("review", distinct=True) counts all reviews linked to each course, including any that may be hidden or flagged as toxic. If the intent is to reflect publicly visible engagement, you may want to filter to visible reviews only. However, if using total review count as a raw popularity signal is intentional, this is fine as-is.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_website/views/catalog/search.py` around lines 64 - 65, The current
annotation review_count = Count("review", distinct=True) tallies all linked
reviews including hidden/toxic ones; update the annotation in search.py to use a
filtered Count that only includes publicly visible reviews (e.g., use the review
model's visibility flag such as is_visible/is_public and exclude flagged/toxic
reviews) so review_count reflects visible engagement, or alternatively add a
separate visible_review_count if you need both raw and public counts; locate the
Count call that defines review_count and replace it with a filtered aggregation
referencing the review visibility field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 64-65: The current annotation review_count = Count("review",
distinct=True) tallies all linked reviews including hidden/toxic ones; update
the annotation in search.py to use a filtered Count that only includes publicly
visible reviews (e.g., use the review model's visibility flag such as
is_visible/is_public and exclude flagged/toxic reviews) so review_count reflects
visible engagement, or alternatively add a separate visible_review_count if you
need both raw and public counts; locate the Count call that defines review_count
and replace it with a filtered aggregation referencing the review visibility
field.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7fb0345b-ad26-4408-8886-8c37985f2795

📥 Commits

Reviewing files that changed from the base of the PR and between 3104589 and cb44485.

📒 Files selected for processing (1)
  • tcf_website/views/catalog/search.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants