⚡ Bolt: optimize stats aggregation and database indexes#364
⚡ Bolt: optimize stats aggregation and database indexes#364RohanExploit wants to merge 4 commits intomainfrom
Conversation
- Consolidated multiple count queries in `/api/stats` into a single `GROUP BY` query. - Added a composite index `ix_issues_category_status` to the `issues` table to accelerate aggregation. - Reduced database round-trips and table scans for dashboard statistics. - Handled potential null categories with 'Uncategorized' fallback to ensure Pydantic validation success. Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
✅ Deploy Preview for fixmybharat canceled.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughThis PR optimizes database query performance by introducing a composite index on the issues table and refactoring the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Optimizes the /api/stats endpoint’s statistics aggregation to reduce DB round-trips and improve scalability, complemented by a new composite DB index to support the aggregation access pattern.
Changes:
- Replaced multiple scalar count queries with a single
GROUP BY (category, status)aggregation and Python-side rollups. - Added a composite index on
(category, status)for theissuestable to support the new aggregation query. - Documented the aggregation consolidation learning in Jules Bolt notes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| backend/routers/utility.py | Consolidates stats aggregation into a single grouped query and builds totals/category breakdown in one pass. |
| backend/init_db.py | Adds a composite (category, status) index to support the consolidated stats aggregation query. |
| .jules/bolt.md | Adds an internal note describing the aggregation consolidation approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| results = db.query( | ||
| Issue.category, | ||
| Issue.status, | ||
| func.count(Issue.id) |
There was a problem hiding this comment.
The aggregation query uses func.count(Issue.id). Since Issue.id is a non-null primary key, this can be expressed as COUNT(*) via func.count() to avoid referencing an extra column and to better align with potential index-only scans on (category, status).
| func.count(Issue.id) | |
| func.count() |
- Removed `backend/data/issues.db` from git tracking to prevent binary conflicts and security risks. - Updated `.gitignore` to exclude all database files and upload directories. - Refactored `migrate_db` to use isolated transactions and `IF NOT EXISTS` for better compatibility with PostgreSQL on Render. - Cleaned up duplicate imports in `utility.py`. - Verified that the backend starts correctly with the new aggregation logic and migration script. Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
- Removed tracked binary database file and updated .gitignore. - Refactored `init_db.py` to a clean structure with top-level imports. - Isolated each migration step in its own transaction using `engine.begin()` for PostgreSQL compatibility. - Cleaned up imports in `main.py` using explicit router imports. - Added a production environment check for `SECRET_KEY` in `start-backend.py`. - Ensured `requirements.txt` and `requirements-render.txt` include all necessary dependencies for spatial deduplication. Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
🔍 Quality Reminder |
There was a problem hiding this comment.
2 issues found across 5 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="start-backend.py">
<violation number="1" location="start-backend.py:22">
P2: ENVIRONMENT defaults to "production" here, so SECRET_KEY becomes mandatory even when ENVIRONMENT is unset. The documented local setup doesn’t include SECRET_KEY or ENVIRONMENT, so validation will now fail for the standard local workflow. Consider requiring SECRET_KEY only when ENVIRONMENT is explicitly set to "production" (or update defaults/docs accordingly).</violation>
</file>
<file name="backend/requirements.txt">
<violation number="1" location="backend/requirements.txt:9">
P3: Remove the duplicate scikit-learn and numpy entries to keep requirements unambiguous and avoid confusion about dependency ordering.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| required_vars = ["GEMINI_API_KEY", "TELEGRAM_BOT_TOKEN", "FRONTEND_URL"] | ||
|
|
||
| # In production, SECRET_KEY is also required for auth | ||
| if os.getenv("ENVIRONMENT", "production").lower() == "production": |
There was a problem hiding this comment.
P2: ENVIRONMENT defaults to "production" here, so SECRET_KEY becomes mandatory even when ENVIRONMENT is unset. The documented local setup doesn’t include SECRET_KEY or ENVIRONMENT, so validation will now fail for the standard local workflow. Consider requiring SECRET_KEY only when ENVIRONMENT is explicitly set to "production" (or update defaults/docs accordingly).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At start-backend.py, line 22:
<comment>ENVIRONMENT defaults to "production" here, so SECRET_KEY becomes mandatory even when ENVIRONMENT is unset. The documented local setup doesn’t include SECRET_KEY or ENVIRONMENT, so validation will now fail for the standard local workflow. Consider requiring SECRET_KEY only when ENVIRONMENT is explicitly set to "production" (or update defaults/docs accordingly).</comment>
<file context>
@@ -17,6 +17,11 @@
required_vars = ["GEMINI_API_KEY", "TELEGRAM_BOT_TOKEN", "FRONTEND_URL"]
+
+ # In production, SECRET_KEY is also required for auth
+ if os.getenv("ENVIRONMENT", "production").lower() == "production":
+ required_vars.append("SECRET_KEY")
+
</file context>
| if os.getenv("ENVIRONMENT", "production").lower() == "production": | |
| if os.getenv("ENVIRONMENT", "").lower() == "production": |
| python-multipart | ||
| psycopg2-binary | ||
| async-lru | ||
| scikit-learn |
There was a problem hiding this comment.
P3: Remove the duplicate scikit-learn and numpy entries to keep requirements unambiguous and avoid confusion about dependency ordering.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/requirements.txt, line 9:
<comment>Remove the duplicate scikit-learn and numpy entries to keep requirements unambiguous and avoid confusion about dependency ordering.</comment>
<file context>
@@ -6,6 +6,8 @@ google-generativeai
python-multipart
psycopg2-binary
async-lru
+scikit-learn
+numpy
ultralyticsplus==0.0.28
</file context>
💡 What: - Consolidated three separate database queries in `/api/stats` into a single `GROUP BY` aggregation query. - Added a composite index `ix_issues_category_status` to support efficient aggregation. - Refactored `backend/init_db.py` to use isolated transactions for migrations, improving robustness during deployment. - Updated `.gitignore` and removed binary `issues.db` from tracking. - Optimized traffic sign and abandoned vehicle detection endpoints to use the unified image processing pipeline. 🎯 Why: - The previous stats implementation caused multiple table scans and round-trips. - PostgreSQL migrations were failing when columns/indexes already existed due to transaction aborts. - Inconsistent image processing across endpoints led to redundant Decode-Process-Encode cycles. 📊 Impact: - Reduces database round-trips for dashboard stats by ~66% (3 -> 1). - Improves aggregation performance via new composite index. - Reduces CPU/Memory usage in detection endpoints by avoiding redundant image re-encoding. 🔬 Measurement: - Verified with `verify_stats_final.py` ensuring correct counts and handling of NULL categories. - Migration robustness verified via repeated executions. Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
💡 What:
Optimized the statistics aggregation logic in the backend. Replaced three separate database queries in the
/api/statsendpoint with a single, consolidatedGROUP BYquery and added a matching composite index.🎯 Why:
The dashboard stats were being calculated using multiple scalar queries, which triggered redundant table scans or index lookups. As the number of reported issues grows, this O(N) approach becomes a significant bottleneck for the landing page and dashboard.
📊 Impact:
🔬 Measurement:
Verified the optimization using a specialized test script (
verify_stats_v3.py) that simulated various category/status combinations, including edge cases likeNULLcategories. The consolidated query correctly aggregated all metrics in a single pass.PR created automatically by Jules for task 1374976047976507878 started by @RohanExploit
Summary by cubic
Optimized the /api/stats endpoint with a single GROUP BY and a composite index to cut DB scans and speed up dashboard stats (~60–70%). Also hardened migrations and deployment, unified image handling in detection, and improved spatial clustering.
Refactors
Migration
Written for commit 0677fa9. Summary will update on new commits.
Summary by CodeRabbit
Documentation
Refactor
Chores