Enhanced Search Results#1246
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 89-97: The query currently orders only by the rounded alias
weighted_score (Round(..., 2)), which causes many ties and unstable pagination;
keep the full-precision score expression (the unrounded
Value(_SIMILARITY_WEIGHT) * F("max_similarity") +
Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") +
Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it
like full_score) and use it first in order_by, still expose the rounded
weighted_score for display if needed, then add deterministic tie-breakers such
as a unique PK or created/updated timestamp (e.g., order_by("-full_score",
"-weighted_score", "pk")) so pagination is stable and reproducible.
- Around line 53-54: Semester.latest() may return None so dereferencing .number
causes a 500; update the logic around latest_sem_number in the search view to
first capture latest = Semester.latest() and guard it (e.g., if latest is None
set latest_sem_number = 0.0 or choose a sensible default/skipped recency weight)
before converting to float and using it in the weighted ordering; change the
code that currently calls float(Semester.latest().number) to use this guarded
variable (reference symbols: Semester.latest(), latest_sem_number).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9bcb8763-4778-4634-bf46-429d986b4a7a
📒 Files selected for processing (1)
tcf_website/views/catalog/search.py
| weighted_score=Round( | ||
| Value(_SIMILARITY_WEIGHT) * F("max_similarity") | ||
| + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") | ||
| + Value(_RECENCY_WEIGHT) * F("recency_score"), | ||
| 2, | ||
| ), | ||
| ) | ||
| .filter(max_similarity__gte=_SIMILARITY_THRESHOLD) | ||
| .order_by("-max_similarity") | ||
| .order_by("-weighted_score") |
There was a problem hiding this comment.
Use a stable sort key for paginated results.
weighted_score is rounded to 2 decimals and then used as the only ORDER BY. For broad queries, that will create a lot of ties, so page 1/page 2 can reshuffle or repeat courses between requests. Keep the full-precision score for ranking and add deterministic tie-breakers.
Suggested fix
.annotate(
- weighted_score=Round(
- Value(_SIMILARITY_WEIGHT) * F("max_similarity")
- + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
- + Value(_RECENCY_WEIGHT) * F("recency_score"),
- 2,
- ),
+ weighted_score=(
+ Value(_SIMILARITY_WEIGHT) * F("max_similarity")
+ + Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity")
+ + Value(_RECENCY_WEIGHT) * F("recency_score")
+ ),
)
.filter(max_similarity__gte=_SIMILARITY_THRESHOLD)
- .order_by("-weighted_score")
+ .order_by(
+ "-weighted_score",
+ "-max_similarity",
+ "-review_count",
+ "-semester_last_taught__number",
+ "pk",
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tcf_website/views/catalog/search.py` around lines 89 - 97, The query
currently orders only by the rounded alias weighted_score (Round(..., 2)), which
causes many ties and unstable pagination; keep the full-precision score
expression (the unrounded Value(_SIMILARITY_WEIGHT) * F("max_similarity") +
Value(_REVIEW_POPULARITY_WEIGHT) * F("review_popularity") +
Value(_RECENCY_WEIGHT) * F("recency_score")) as an ordering key (e.g., alias it
like full_score) and use it first in order_by, still expose the rounded
weighted_score for display if needed, then add deterministic tie-breakers such
as a unique PK or created/updated timestamp (e.g., order_by("-full_score",
"-weighted_score", "pk")) so pagination is stable and reproducible.
| 50.0 | ||
| ), # anything last taught 5 years ago or later has a score of 0 | ||
| output_field=FloatField(), | ||
| ), |
There was a problem hiding this comment.
are we able to suppress the ruff line length here?
|
How does this affect performance? review_count=Count("review", distinct=True) looks very expensive. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tcf_website/views/catalog/search.py (1)
64-65: Review count includes all reviews regardless of visibility.
Count("review", distinct=True)counts all reviews linked to each course, including any that may be hidden or flagged as toxic. If the intent is to reflect publicly visible engagement, you may want to filter to visible reviews only. However, if using total review count as a raw popularity signal is intentional, this is fine as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tcf_website/views/catalog/search.py` around lines 64 - 65, The current annotation review_count = Count("review", distinct=True) tallies all linked reviews including hidden/toxic ones; update the annotation in search.py to use a filtered Count that only includes publicly visible reviews (e.g., use the review model's visibility flag such as is_visible/is_public and exclude flagged/toxic reviews) so review_count reflects visible engagement, or alternatively add a separate visible_review_count if you need both raw and public counts; locate the Count call that defines review_count and replace it with a filtered aggregation referencing the review visibility field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tcf_website/views/catalog/search.py`:
- Around line 64-65: The current annotation review_count = Count("review",
distinct=True) tallies all linked reviews including hidden/toxic ones; update
the annotation in search.py to use a filtered Count that only includes publicly
visible reviews (e.g., use the review model's visibility flag such as
is_visible/is_public and exclude flagged/toxic reviews) so review_count reflects
visible engagement, or alternatively add a separate visible_review_count if you
need both raw and public counts; locate the Count call that defines review_count
and replace it with a filtered aggregation referencing the review visibility
field.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7fb0345b-ad26-4408-8886-8c37985f2795
📒 Files selected for processing (1)
tcf_website/views/catalog/search.py


Problem:
The search results are often polluted with irrelevant courses that either haven't been taught in a while (e.g. https://thecourseforum.com/search/?mode=courses&q=computer) or aren't popular among the students. This applies particularly for broader searches, such as those like "artificial intelligence" where the plethora of special topics courses related to the matter aren't displayed in the search results and instead there are many graduate law classes that only a small fraction of users will ever click.
What this PR solves:
Summary by CodeRabbit