Skip to content

perf: fix N+1 tag queries in activities and dashboard APIs#31

Open
ankushchk wants to merge 2 commits intoalphaonelabs:mainfrom
ankushchk:fix/n-plus-1-tag-queries
Open

perf: fix N+1 tag queries in activities and dashboard APIs#31
ankushchk wants to merge 2 commits intoalphaonelabs:mainfrom
ankushchk:fix/n-plus-1-tag-queries

Conversation

@ankushchk
Copy link
Copy Markdown

@ankushchk ankushchk commented Mar 29, 2026

Changes

The activities listing and dashboard APIs had an N+1 query problem. For every activity returned by the main query, a separate SQL query was fired just to fetch that activity's tags. So if there were 50 activities, the API would run 51 database queries (1 for activities + 50 for tags). On the dashboard it was even worse since it does this twice, once for hosted and once for joined activities.

Fixed this by using SQLite's GROUP_CONCAT to pull tags directly in the main query. The tags come back as a single comma-separated string per activity, which gets split into a list in Python. No extra queries needed.

Before: each activity page load hit the database N+1 times (1 main query + 1 per activity for tags)
After: always just 1 query for activities listing, 2 for dashboard (hosted + joined)

The fix touches three queries across two functions: api_list_activities (the homepage feed) and api_dashboard (the logged-in user's hosted and joined activities). The response format is unchanged so nothing breaks on the frontend.

Testing

  • Verified all 6 seed activities return the correct tags
  • Checked multi-tag activities like "Data Science Workshop" correctly return both "Data Science" and "Python"
  • Tested all filter combinations (by type, format, tag, and search) to make sure they still work
  • Logged in as alice and confirmed the dashboard returns correct tags for hosted activities

This PR fixes N+1 query inefficiencies in activity listing and dashboard endpoints by aggregating activity tags into the primary SQL queries.

Purpose

  • Eliminate N+1 tag queries in api_list_activities (homepage feed) and api_dashboard (hosted + joined lists) to reduce DB round-trips and improve performance.

Key changes

  • Main queries updated to LEFT JOIN activity_tags and tags and use GROUP_CONCAT(t.name) AS tag_names with GROUP BY a.id to return comma-separated tag names per activity.
  • api_list_activities: tag filtering moved into the aggregated query; per-activity follow-up SELECTs to fetch tags removed. Python mapping now parses tags with row.tag_names.split(",") if present.
  • api_dashboard: hosted and joined activity queries use GROUP_CONCAT(...) and GROUP BY a.id so hosted and joined lists are fetched in two queries (hosted + joined) instead of N+1 per list. Python mapping for hosted uses r.tag_names.split(",") as tags.
  • Response shape unchanged: tags remain arrays in JSON.

Impact & notes

  • Query count reduced: activities listing → 1 query total; dashboard → 2 queries (hosted + joined).
  • Filters (type, format, tag, search) and multi-tag activities are preserved per tests included in the PR.
  • Minor bug observed in current api_dashboard joined-activities mapping: it references t_res (from another scope) when building joined activity tags instead of parsing r.tag_names, which will cause incorrect/jumbled tag output for joined_activities. This should be corrected to: "tags": r.tag_names.split(",") if r.tag_names else [].

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

Walkthrough

Consolidates tag retrieval in src/worker.py by adding LEFT JOINs and GROUP_CONCAT(t.name) to activity listing and dashboard SQL queries, removing per-activity tag queries and mapping tags from the aggregated tag_names field.

Changes

Cohort / File(s) Summary
SQL Query & Result Mapping Optimization
src/worker.py
Refactored api_list_activities and api_dashboard to include LEFT JOIN activity_tags and LEFT JOIN tags, use GROUP_CONCAT(t.name) AS tag_names and GROUP BY a.id, update tag filtering to operate on the aggregated query, and replace per-activity follow-up SELECTs with row.tag_names.split(",") if row.tag_names else [] mapping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main change: fixing N+1 query performance issues in tag retrieval for activities and dashboard APIs, which directly matches the changeset's primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/worker.py (1)

665-673: ⚠️ Potential issue | 🟠 Major

Bug: Tag filter returns only the filtered tag instead of all tags for matching activities.

When filtering by tag using WHERE at2.tag_id=?, the join rows are filtered before GROUP_CONCAT aggregates. This means an activity with tags ["Python", "Data Science"] filtered by "Python" will return tags: ["Python"] rather than both tags.

To preserve all tags while filtering by a specific tag, consider using a subquery for the filter condition:

🐛 Proposed fix using EXISTS subquery
     if tag:
         tag_row = await env.DB.prepare(
             "SELECT id FROM tags WHERE name=?"
         ).bind(tag).first()
         if not tag_row:
             return json_resp({"activities": []})
         res = await env.DB.prepare(
-            base_q + " WHERE at2.tag_id=?" + group_order
+            base_q + " WHERE EXISTS (SELECT 1 FROM activity_tags at_filter"
+            " WHERE at_filter.activity_id=a.id AND at_filter.tag_id=?)" + group_order
         ).bind(tag_row.id).all()

This approach filters activities that have the specified tag while still allowing GROUP_CONCAT to aggregate all tags for each matching activity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/worker.py` around lines 665 - 673, The tag filter currently applies
"WHERE at2.tag_id=?" directly (see tag_row and the query built from base_q),
which restricts rows before GROUP_CONCAT and loses other tags; modify the query
to filter activities by existence of the tag instead (e.g., replace "WHERE
at2.tag_id=?" with a subquery/EXISTS that checks activity_id IN (SELECT
activity_id FROM tags_activities WHERE tag_id=?) or use WHERE EXISTS (SELECT 1
FROM tags_activities at_filter WHERE at_filter.activity_id=activities.id AND
at_filter.tag_id=?)), keep using tag_row.id for the bind value, and leave the
GROUP_CONCAT/join that produces tags unchanged so all tags for matching
activities are aggregated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/worker.py`:
- Line 711: The current split on commas will break tag names containing commas;
update the SQL that produces tag_names (the GROUP_CONCAT call used to build
row.tag_names) to use a unique separator unlikely to appear in tags (e.g.,
GROUP_CONCAT(t.name, '|||') AS tag_names) and change the Python handling where
row.tag_names is parsed (the place building "tags": row.tag_names.split(",") if
row.tag_names else []) to split on that same unique separator (e.g.,
split("|||")) so tag names containing commas remain intact; ensure both the SQL
alias producing row.tag_names and the parsing logic reference the exact same
separator string.

---

Outside diff comments:
In `@src/worker.py`:
- Around line 665-673: The tag filter currently applies "WHERE at2.tag_id=?"
directly (see tag_row and the query built from base_q), which restricts rows
before GROUP_CONCAT and loses other tags; modify the query to filter activities
by existence of the tag instead (e.g., replace "WHERE at2.tag_id=?" with a
subquery/EXISTS that checks activity_id IN (SELECT activity_id FROM
tags_activities WHERE tag_id=?) or use WHERE EXISTS (SELECT 1 FROM
tags_activities at_filter WHERE at_filter.activity_id=activities.id AND
at_filter.tag_id=?)), keep using tag_row.id for the bind value, and leave the
GROUP_CONCAT/join that produces tags unchanged so all tags for matching
activities are aggregated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a2b6602a-9520-4ad9-a7f9-0a4523360bec

📥 Commits

Reviewing files that changed from the base of the PR and between d6e1a9c and f869b62.

📒 Files selected for processing (1)
  • src/worker.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/worker.py (2)

1021-1045: ⚠️ Potential issue | 🔴 Critical

Joined dashboard still references a removed query result.

Line 1045 reads from t_res, but api_dashboard never defines that variable. Any user with at least one joined activity will raise NameError while serializing joined_activities. Read from r.tag_names here instead. Please add a regression test for a user with a non-empty joined list.

💡 Proposed fix
-            "tags":          [t.name for t in (t_res.results or [])],
+            "tags":          r.tag_names.split(",") if r.tag_names else [],

If you switch away from comma-delimited tags, keep this parser aligned with the query separator.
As per coding guidelines, "Verify tests cover the key logic paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/worker.py` around lines 1021 - 1045, The joined-activities serialization
is referencing an undefined variable t_res causing NameError; update the loop
that builds joined (after res2 from env.DB.prepare) to read tag names from
r.tag_names (parse the GROUP_CONCAT string, e.g. split by ',' or the actual
separator used in the SQL) instead of using t_res, keep using
decrypt_aes(r.host_name_enc, enc) for host_name, and ensure the parser matches
the query separator; then add a regression test in the api_dashboard test suite
that creates a user with at least one joined activity having tags to assert
joined includes the parsed tags.

751-766: ⚠️ Potential issue | 🔴 Critical

Tag filter now strips the other tags from multi-tag activities.

Line 765 filters the same join rows that Lines 751-754 feed into GROUP_CONCAT, so ?tag=Python will return only the matched tag for a multi-tag activity instead of all of its tags. Filter with EXISTS or a second activity_tags alias and leave the aggregation join untouched. Please add a regression test for a multi-tag activity under a tag filter.

💡 Proposed fix
-        res = await env.DB.prepare(
-            base_q + " WHERE at2.tag_id=?" + group_order
-        ).bind(tag_row.id).all()
+        res = await env.DB.prepare(
+            base_q
+            + " WHERE EXISTS ("
+            + "SELECT 1 FROM activity_tags at_filter "
+            + "WHERE at_filter.activity_id=a.id AND at_filter.tag_id=?"
+            + ")"
+            + group_order
+        ).bind(tag_row.id).all()
As per coding guidelines, "Verify tests cover the key logic paths."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/worker.py` around lines 751 - 766, The tag filter currently applies a
WHERE on the same joined activity_tags alias used for the GROUP_CONCAT
aggregation (base_q + " WHERE at2.tag_id=?"), which strips other tags from
multi-tag activities; change the query to leave the aggregation join untouched
and filter activities by using an EXISTS subquery or a second activity_tags
alias (e.g., use EXISTS(SELECT 1 FROM activity_tags at_filter WHERE
at_filter.activity_id=a.id AND at_filter.tag_id=?) or join activity_tags
at_filter for the filter) while keeping the original at2 join used by base_q,
update the code paths around tag_row and the env.DB.prepare call to bind the tag
id to the filter clause instead, and add a regression test that creates an
activity with multiple tags and asserts that querying with ?tag=OneTag returns
the activity with all its tags present in the tag_names GROUP_CONCAT result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/worker.py`:
- Around line 1021-1045: The joined-activities serialization is referencing an
undefined variable t_res causing NameError; update the loop that builds joined
(after res2 from env.DB.prepare) to read tag names from r.tag_names (parse the
GROUP_CONCAT string, e.g. split by ',' or the actual separator used in the SQL)
instead of using t_res, keep using decrypt_aes(r.host_name_enc, enc) for
host_name, and ensure the parser matches the query separator; then add a
regression test in the api_dashboard test suite that creates a user with at
least one joined activity having tags to assert joined includes the parsed tags.
- Around line 751-766: The tag filter currently applies a WHERE on the same
joined activity_tags alias used for the GROUP_CONCAT aggregation (base_q + "
WHERE at2.tag_id=?"), which strips other tags from multi-tag activities; change
the query to leave the aggregation join untouched and filter activities by using
an EXISTS subquery or a second activity_tags alias (e.g., use EXISTS(SELECT 1
FROM activity_tags at_filter WHERE at_filter.activity_id=a.id AND
at_filter.tag_id=?) or join activity_tags at_filter for the filter) while
keeping the original at2 join used by base_q, update the code paths around
tag_row and the env.DB.prepare call to bind the tag id to the filter clause
instead, and add a regression test that creates an activity with multiple tags
and asserts that querying with ?tag=OneTag returns the activity with all its
tags present in the tag_names GROUP_CONCAT result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5bb1ca8c-6990-411a-bca1-f6aad4acacae

📥 Commits

Reviewing files that changed from the base of the PR and between f869b62 and 8bdec94.

📒 Files selected for processing (1)
  • src/worker.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant