perf: fix N+1 tag queries in activities and dashboard APIs#31
perf: fix N+1 tag queries in activities and dashboard APIs#31ankushchk wants to merge 2 commits intoalphaonelabs:mainfrom
Conversation
Made-with: Cursor
WalkthroughConsolidates tag retrieval in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/worker.py (1)
665-673:⚠️ Potential issue | 🟠 MajorBug: Tag filter returns only the filtered tag instead of all tags for matching activities.
When filtering by tag using
WHERE at2.tag_id=?, the join rows are filtered beforeGROUP_CONCATaggregates. This means an activity with tags["Python", "Data Science"]filtered by "Python" will returntags: ["Python"]rather than both tags.To preserve all tags while filtering by a specific tag, consider using a subquery for the filter condition:
🐛 Proposed fix using EXISTS subquery
if tag: tag_row = await env.DB.prepare( "SELECT id FROM tags WHERE name=?" ).bind(tag).first() if not tag_row: return json_resp({"activities": []}) res = await env.DB.prepare( - base_q + " WHERE at2.tag_id=?" + group_order + base_q + " WHERE EXISTS (SELECT 1 FROM activity_tags at_filter" + " WHERE at_filter.activity_id=a.id AND at_filter.tag_id=?)" + group_order ).bind(tag_row.id).all()This approach filters activities that have the specified tag while still allowing
GROUP_CONCATto aggregate all tags for each matching activity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/worker.py` around lines 665 - 673, The tag filter currently applies "WHERE at2.tag_id=?" directly (see tag_row and the query built from base_q), which restricts rows before GROUP_CONCAT and loses other tags; modify the query to filter activities by existence of the tag instead (e.g., replace "WHERE at2.tag_id=?" with a subquery/EXISTS that checks activity_id IN (SELECT activity_id FROM tags_activities WHERE tag_id=?) or use WHERE EXISTS (SELECT 1 FROM tags_activities at_filter WHERE at_filter.activity_id=activities.id AND at_filter.tag_id=?)), keep using tag_row.id for the bind value, and leave the GROUP_CONCAT/join that produces tags unchanged so all tags for matching activities are aggregated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/worker.py`:
- Line 711: The current split on commas will break tag names containing commas;
update the SQL that produces tag_names (the GROUP_CONCAT call used to build
row.tag_names) to use a unique separator unlikely to appear in tags (e.g.,
GROUP_CONCAT(t.name, '|||') AS tag_names) and change the Python handling where
row.tag_names is parsed (the place building "tags": row.tag_names.split(",") if
row.tag_names else []) to split on that same unique separator (e.g.,
split("|||")) so tag names containing commas remain intact; ensure both the SQL
alias producing row.tag_names and the parsing logic reference the exact same
separator string.
---
Outside diff comments:
In `@src/worker.py`:
- Around line 665-673: The tag filter currently applies "WHERE at2.tag_id=?"
directly (see tag_row and the query built from base_q), which restricts rows
before GROUP_CONCAT and loses other tags; modify the query to filter activities
by existence of the tag instead (e.g., replace "WHERE at2.tag_id=?" with a
subquery/EXISTS that checks activity_id IN (SELECT activity_id FROM
tags_activities WHERE tag_id=?) or use WHERE EXISTS (SELECT 1 FROM
tags_activities at_filter WHERE at_filter.activity_id=activities.id AND
at_filter.tag_id=?)), keep using tag_row.id for the bind value, and leave the
GROUP_CONCAT/join that produces tags unchanged so all tags for matching
activities are aggregated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2b6602a-9520-4ad9-a7f9-0a4523360bec
📒 Files selected for processing (1)
src/worker.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/worker.py (2)
1021-1045:⚠️ Potential issue | 🔴 CriticalJoined dashboard still references a removed query result.
Line 1045 reads from
t_res, butapi_dashboardnever defines that variable. Any user with at least one joined activity will raiseNameErrorwhile serializingjoined_activities. Read fromr.tag_nameshere instead. Please add a regression test for a user with a non-empty joined list.💡 Proposed fix
- "tags": [t.name for t in (t_res.results or [])], + "tags": r.tag_names.split(",") if r.tag_names else [],If you switch away from comma-delimited tags, keep this parser aligned with the query separator.
As per coding guidelines, "Verify tests cover the key logic paths."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/worker.py` around lines 1021 - 1045, The joined-activities serialization is referencing an undefined variable t_res causing NameError; update the loop that builds joined (after res2 from env.DB.prepare) to read tag names from r.tag_names (parse the GROUP_CONCAT string, e.g. split by ',' or the actual separator used in the SQL) instead of using t_res, keep using decrypt_aes(r.host_name_enc, enc) for host_name, and ensure the parser matches the query separator; then add a regression test in the api_dashboard test suite that creates a user with at least one joined activity having tags to assert joined includes the parsed tags.
751-766:⚠️ Potential issue | 🔴 CriticalTag filter now strips the other tags from multi-tag activities.
Line 765 filters the same join rows that Lines 751-754 feed into
GROUP_CONCAT, so?tag=Pythonwill return only the matched tag for a multi-tag activity instead of all of its tags. Filter withEXISTSor a secondactivity_tagsalias and leave the aggregation join untouched. Please add a regression test for a multi-tag activity under a tag filter.As per coding guidelines, "Verify tests cover the key logic paths."💡 Proposed fix
- res = await env.DB.prepare( - base_q + " WHERE at2.tag_id=?" + group_order - ).bind(tag_row.id).all() + res = await env.DB.prepare( + base_q + + " WHERE EXISTS (" + + "SELECT 1 FROM activity_tags at_filter " + + "WHERE at_filter.activity_id=a.id AND at_filter.tag_id=?" + + ")" + + group_order + ).bind(tag_row.id).all()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/worker.py` around lines 751 - 766, The tag filter currently applies a WHERE on the same joined activity_tags alias used for the GROUP_CONCAT aggregation (base_q + " WHERE at2.tag_id=?"), which strips other tags from multi-tag activities; change the query to leave the aggregation join untouched and filter activities by using an EXISTS subquery or a second activity_tags alias (e.g., use EXISTS(SELECT 1 FROM activity_tags at_filter WHERE at_filter.activity_id=a.id AND at_filter.tag_id=?) or join activity_tags at_filter for the filter) while keeping the original at2 join used by base_q, update the code paths around tag_row and the env.DB.prepare call to bind the tag id to the filter clause instead, and add a regression test that creates an activity with multiple tags and asserts that querying with ?tag=OneTag returns the activity with all its tags present in the tag_names GROUP_CONCAT result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/worker.py`:
- Around line 1021-1045: The joined-activities serialization is referencing an
undefined variable t_res causing NameError; update the loop that builds joined
(after res2 from env.DB.prepare) to read tag names from r.tag_names (parse the
GROUP_CONCAT string, e.g. split by ',' or the actual separator used in the SQL)
instead of using t_res, keep using decrypt_aes(r.host_name_enc, enc) for
host_name, and ensure the parser matches the query separator; then add a
regression test in the api_dashboard test suite that creates a user with at
least one joined activity having tags to assert joined includes the parsed tags.
- Around line 751-766: The tag filter currently applies a WHERE on the same
joined activity_tags alias used for the GROUP_CONCAT aggregation (base_q + "
WHERE at2.tag_id=?"), which strips other tags from multi-tag activities; change
the query to leave the aggregation join untouched and filter activities by using
an EXISTS subquery or a second activity_tags alias (e.g., use EXISTS(SELECT 1
FROM activity_tags at_filter WHERE at_filter.activity_id=a.id AND
at_filter.tag_id=?) or join activity_tags at_filter for the filter) while
keeping the original at2 join used by base_q, update the code paths around
tag_row and the env.DB.prepare call to bind the tag id to the filter clause
instead, and add a regression test that creates an activity with multiple tags
and asserts that querying with ?tag=OneTag returns the activity with all its
tags present in the tag_names GROUP_CONCAT result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5bb1ca8c-6990-411a-bca1-f6aad4acacae
📒 Files selected for processing (1)
src/worker.py
Changes
The activities listing and dashboard APIs had an N+1 query problem. For every activity returned by the main query, a separate SQL query was fired just to fetch that activity's tags. So if there were 50 activities, the API would run 51 database queries (1 for activities + 50 for tags). On the dashboard it was even worse since it does this twice, once for hosted and once for joined activities.
Fixed this by using SQLite's
GROUP_CONCATto pull tags directly in the main query. The tags come back as a single comma-separated string per activity, which gets split into a list in Python. No extra queries needed.Before: each activity page load hit the database N+1 times (1 main query + 1 per activity for tags)
After: always just 1 query for activities listing, 2 for dashboard (hosted + joined)
The fix touches three queries across two functions:
api_list_activities(the homepage feed) andapi_dashboard(the logged-in user's hosted and joined activities). The response format is unchanged so nothing breaks on the frontend.Testing
This PR fixes N+1 query inefficiencies in activity listing and dashboard endpoints by aggregating activity tags into the primary SQL queries.
Purpose
Key changes
Impact & notes