Skip to content

Conversation

@Devasy
Copy link
Owner

@Devasy Devasy commented Dec 7, 2025

Enhance performance by optimizing database queries for enriching member data, reducing the number of queries from multiple to a single batch fetch. This change improves efficiency and ensures faster data retrieval for user details.

Summary by CodeRabbit

  • Bug Fixes

    • Resolved duplicate email conflicts to ensure account consistency and safe cleanup options.
  • Performance

    • Consolidated friend balance computations into a single aggregated query with batched enrichment for faster, fewer round-trips.
    • Batch-enriched group member lookups to reduce per-member queries.
  • Chores

    • Added database index migration with verification and safe fallback handling.
  • Tests

    • Updated and expanded tests to cover aggregated balances, enrichment fallbacks, and error paths.

✏️ Tip: You can customize this high-level summary in your review settings.

@netlify
Copy link

netlify bot commented Dec 7, 2025

Deploy Preview for split-but-wiser canceled.

Name Link
🔨 Latest commit 790ae8a
🔍 Latest deploy log https://app.netlify.com/projects/split-but-wiser/deploys/69354a0cdb7e31000853de82

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 7, 2025

Walkthrough

Replaced per-item DB lookups with aggregated MongoDB pipelines and batch fetches in expenses and groups services; added two async DB migrations for indexes and duplicate-email remediation; updated tests to reflect the aggregation-based friends-balance flow and added fallback/error-handling behavior.

Changes

Cohort / File(s) Summary
Expenses aggregation
backend/app/expenses/service.py
Rewrote get_friends_balance_summary to use a single MongoDB aggregation to compute per-friend totals and per-group breakdowns, filter near-zero balances, batch-fetch friend details, build friend result objects (including rounded balances and breakdowns), compute aggregated summary totals, and add error handling and empty-result fallbacks.
Group member enrichment
backend/app/groups/service.py
Reworked _enrich_members_with_user_details to batch-fetch users with one $in query, build a lookup map, enrich members from that map, preserve members without userId, and added a _create_fallback_member(member: dict) -> dict helper for consistent fallback enrichment.
Migrations
backend/migrations/001_create_indexes.py, backend/migrations/002_fix_duplicate_emails.py
Added 001_create_indexes.py (async create_indexes() and drop_indexes_if_needed) to create/verify indexes across multiple collections with error handling and logging; added 002_fix_duplicate_emails.py with find_duplicate_emails, fix_duplicates_auto, fix_duplicates_interactive, and main() to detect and optionally resolve duplicate users.email entries (dry-run/interactive/auto modes).
Tests
backend/tests/expenses/test_expense_service.py, backend/tests/groups/test_enrich_members.py
Updated expense service tests to expect a single aggregated aggregate() call and added cases for aggregation/user-fetch errors, zero-balance filtering, and negative balances; added comprehensive tests for _enrich_members_with_user_details covering batch-fetching, fallback creation, invalid IDs, missing users, and DB errors.

Possibly related PRs

Suggested reviewers

  • vrajpatelll

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: optimizing database queries for friends balance summary and enriching member data with batch fetches instead of per-item lookups.
Docstring Coverage ✅ Passed Docstring coverage is 96.67% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Devasy Devasy self-assigned this Dec 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8a6d92 and 83c4d1b.

📒 Files selected for processing (4)
  • backend/app/expenses/service.py (1 hunks)
  • backend/app/groups/service.py (1 hunks)
  • backend/migrations/001_create_indexes.py (1 hunks)
  • backend/migrations/002_fix_duplicate_emails.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
backend/app/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

backend/app/**/*.py: Backend API routes should be added to the appropriate service router file in backend/app/ based on the feature area (auth, user, groups, expenses)
Use RESTful API design patterns with FastAPI in the backend
Use MongoDB for persistent data storage in the backend with a nonrelational schema design

Files:

  • backend/app/groups/service.py
  • backend/app/expenses/service.py
🧠 Learnings (1)
📚 Learning: 2025-11-27T18:33:22.596Z
Learnt from: CR
Repo: Devasy23/splitwiser PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T18:33:22.596Z
Learning: Applies to backend/app/**/*.py : Use MongoDB for persistent data storage in the backend with a nonrelational schema design

Applied to files:

  • backend/migrations/001_create_indexes.py
🧬 Code graph analysis (1)
backend/app/groups/service.py (2)
backend/app/auth/service.py (1)
  • get_db (77-81)
backend/app/user/service.py (1)
  • get_db (13-14)
🪛 GitHub Actions: Run Backend Tests & Analytics
backend/app/expenses/service.py

[error] 1096-1096: KeyError: 'totalBalance' when accessing result in get_friends_balance_summary. The function expects 'totalBalance' in the result but it is missing.

🪛 Ruff (0.14.7)
backend/app/groups/service.py

70-70: Do not catch blind exception: Exception

(BLE001)

backend/migrations/001_create_indexes.py

52-52: f-string without any placeholders

Remove extraneous f prefix

(F541)


55-55: f-string without any placeholders

Remove extraneous f prefix

(F541)


61-61: Do not catch blind exception: Exception

(BLE001)

backend/migrations/002_fix_duplicate_emails.py

68-68: f-string without any placeholders

Remove extraneous f prefix

(F541)


98-98: Use of datetime.datetime.min without timezone information

(DTZ901)


117-117: f-string without any placeholders

Remove extraneous f prefix

(F541)


119-119: f-string without any placeholders

Remove extraneous f prefix

(F541)


142-142: f-string without any placeholders

Remove extraneous f prefix

(F541)


264-264: Do not catch blind exception: Exception

(BLE001)

backend/app/expenses/service.py

959-959: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


960-960: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


1055-1055: Do not catch blind exception: Exception

(BLE001)


1073-1073: Unnecessary generator (rewrite as a set comprehension)

Rewrite as a set comprehension

(C401)


1086-1086: Do not catch blind exception: Exception

(BLE001)


1132-1132: datetime.datetime.utcnow() used

(DTZ003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: preview
🔇 Additional comments (11)
backend/app/groups/service.py (2)

27-52: LGTM on the batch ID collection logic.

The early return for empty members and the fallback path when no valid user IDs exist are well-handled. Good defensive coding with the InvalidId exception handling.


75-118: LGTM on the enrichment logic and fallback helper.

The lookup map approach provides O(1) access per member, and the fallback helper ensures consistent data structure when user lookup fails.

backend/migrations/002_fix_duplicate_emails.py (3)

29-80: LGTM on the duplicate detection logic.

The aggregation pipeline correctly identifies duplicate emails by grouping and filtering for count > 1. Good logging provides visibility into the duplicates found.


127-193: Interactive mode shares the same orphan data concern.

The same cascading deletion issue applies here. Consider consolidating the deletion logic into a shared helper that handles related data cleanup.


196-268: LGTM on the main function structure.

Proper argument parsing, confirmation prompt for destructive operations, and guaranteed client cleanup in the finally block. The broad exception catch at the top level is acceptable for migration scripts.

backend/migrations/001_create_indexes.py (2)

45-65: Good fallback handling for unique email index.

The fallback to a non-unique index when duplicates exist ensures the migration doesn't fail outright, while still providing query performance benefits. This pairs well with the new 002_fix_duplicate_emails.py migration.


282-285: LGTM on the migration entry point and structure.

The script provides comprehensive logging, verification, and proper resource cleanup. This follows good migration script patterns.

backend/app/expenses/service.py (4)

955-981: LGTM on early return handling.

The early exit when the user has no groups avoids unnecessary queries and returns a consistent zeroed structure. The optimization documentation is helpful.


993-1048: Aggregation pipeline structure is correct.

The four-stage pipeline efficiently calculates per-friend balances across groups. The $cond logic correctly determines balance direction based on payer/payee roles.


1050-1070: Good error handling for aggregation execution.

The try/except with fallback to empty results ensures the endpoint remains functional even if the aggregation fails. The early return for empty results avoids unnecessary processing.


1101-1116: LGTM on breakdown construction.

The safe access with .get("groups", []) and the balance threshold filtering are correctly implemented.

@codecov
Copy link

codecov bot commented Dec 7, 2025

Codecov Report

❌ Patch coverage is 98.41270% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.42%. Comparing base (c8a6d92) to head (790ae8a).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
+ Coverage   72.79%   74.42%   +1.62%     
==========================================
  Files          17       17              
  Lines        1669     1693      +24     
  Branches      154      158       +4     
==========================================
+ Hits         1215     1260      +45     
+ Misses        400      381      -19     
+ Partials       54       52       -2     
Components Coverage Δ
Authentication System 75.45% <ø> (ø)
Expense Management 70.64% <97.50%> (+0.76%) ⬆️
Group Management 76.20% <100.00%> (+6.70%) ⬆️
User Management 97.16% <ø> (ø)
Backend Core 69.41% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83c4d1b and 171147e.

📒 Files selected for processing (1)
  • backend/tests/expenses/test_expense_service.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
backend/tests/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Backend tests should be located in /backend/tests/ and run with pytest

Files:

  • backend/tests/expenses/test_expense_service.py
🧬 Code graph analysis (1)
backend/tests/expenses/test_expense_service.py (1)
backend/tests/conftest.py (1)
  • mock_db (65-96)
🪛 Ruff (0.14.7)
backend/tests/expenses/test_expense_service.py

1615-1615: Missing return type annotation for private function sync_mock_settlements_aggregate_cursor_factory

(ANN202)


1615-1615: Unused function argument: pipeline

(ARG001)


1615-1615: Missing type annotation for *args

(ANN002)


1615-1615: Unused function argument: args

(ARG001)


1615-1615: Missing type annotation for **kwargs

(ANN003)


1615-1615: Unused function argument: kwargs

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: preview
🔇 Additional comments (2)
backend/tests/expenses/test_expense_service.py (2)

1605-1643: LGTM! Test correctly reflects optimized aggregation logic.

The mock data structure has been appropriately updated to reflect the single aggregate call optimization. The comments clearly explain how the new approach consolidates all friends' balances into one result set, and the mock structure correctly represents the unified response format with _id (friend ID), totalBalance, and per-group groups array.


1734-1736: Excellent! Assertion correctly validates the optimization.

The updated assertion confirms that the settlement aggregation now uses a single database call instead of multiple calls, which is the core improvement described in the PR objectives. The comment clearly documents this optimization.

Devasy and others added 6 commits December 7, 2025 14:39
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 171147e and 790ae8a.

📒 Files selected for processing (4)
  • backend/app/expenses/service.py (2 hunks)
  • backend/app/groups/service.py (1 hunks)
  • backend/tests/expenses/test_expense_service.py (5 hunks)
  • backend/tests/groups/test_enrich_members.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

backend/app/**/*.py: Backend API routes should be added to the appropriate service router file in backend/app/ based on the feature area (auth, user, groups, expenses)
Use RESTful API design patterns with FastAPI in the backend
Use MongoDB for persistent data storage in the backend with a nonrelational schema design

Files:

  • backend/app/expenses/service.py
  • backend/app/groups/service.py
backend/tests/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Backend tests should be located in /backend/tests/ and run with pytest

Files:

  • backend/tests/expenses/test_expense_service.py
  • backend/tests/groups/test_enrich_members.py
🧬 Code graph analysis (2)
backend/app/groups/service.py (2)
backend/app/auth/service.py (1)
  • get_db (77-81)
backend/app/user/service.py (1)
  • get_db (13-14)
backend/tests/expenses/test_expense_service.py (1)
backend/app/expenses/service.py (1)
  • get_friends_balance_summary (955-1156)
🪛 Ruff (0.14.7)
backend/app/expenses/service.py

959-959: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


960-960: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


1055-1055: Do not catch blind exception: Exception

(BLE001)


1086-1086: Do not catch blind exception: Exception

(BLE001)

backend/app/groups/service.py

70-70: Do not catch blind exception: Exception

(BLE001)

backend/tests/expenses/test_expense_service.py

1617-1617: Dynamically typed expressions (typing.Any) are disallowed in _pipeline

(ANN401)


1617-1617: Dynamically typed expressions (typing.Any) are disallowed in *_args

(ANN401)


1617-1617: Dynamically typed expressions (typing.Any) are disallowed in **_kwargs

(ANN401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: preview
🔇 Additional comments (21)
backend/app/groups/service.py (4)

27-35: LGTM on the docstring and early return.

Good documentation with clear performance expectations (O(1) queries). The early return for empty members avoids unnecessary database calls.


40-52: LGTM on user ID extraction with proper error handling.

The code correctly handles invalid ObjectIds by logging warnings and collecting only valid IDs. Returning fallback members when no valid IDs exist is a reasonable defensive approach.


75-103: LGTM on the enrichment loop.

The batch lookup approach correctly enriches members from the pre-built map with appropriate fallbacks for missing users. The O(1) dictionary lookups per member provide good performance.


106-118: LGTM on the fallback member helper.

The helper correctly handles edge cases including short user IDs (length < 4) and missing userId fields. The conditional slicing user_id[-4:] if len(user_id) >= 4 else user_id is a nice defensive pattern.

backend/tests/expenses/test_expense_service.py (6)

1616-1647: LGTM on the mock factory function.

The function signature with underscore-prefixed unused parameters (_pipeline, _args, _kwargs) and return type annotation addresses the static analysis concerns from the previous review. The mock correctly returns aggregated results for all friends in a single response, matching the optimized implementation.


1737-1739: Good assertion verifying the batch optimization.

The test correctly validates that settlements.aggregate is called exactly once, confirming the optimization from N×M queries to a single aggregation pipeline.


2112-2145: Good test coverage for aggregation error handling.

This test validates that when the MongoDB aggregation fails, the service returns empty results with zeroed summary values rather than propagating the exception. This ensures resilience.


2147-2191: Good test for graceful degradation when user fetch fails.

The test correctly validates that balance data is still returned with "Unknown" user names when the batch user fetch fails. This matches the error handling in the service.


2193-2227: Good test for zero-balance filtering edge case.

This validates the aggregation pipeline's $match filter that excludes friends with near-zero balances (abs < 0.01).


2229-2276: Good test for negative balance calculation.

This test validates the totalYouOwe accumulation path when balances are negative, ensuring correct summary calculations for amounts the user owes.

backend/tests/groups/test_enrich_members.py (6)

17-71: LGTM on the success path test.

The test correctly validates batch enrichment by verifying:

  • All 3 members are enriched with correct user data
  • The query uses $in operator for batch fetching
  • Only one database call is made

72-82: LGTM on empty list test.

Good coverage of the early return path - verifies no database call is made for empty input.


84-108: LGTM on invalid ObjectId test.

Good coverage verifying logger warnings are called and fallback members are returned when all ObjectIds are invalid.


237-272: Excellent batch optimization verification test.

This test with 10 members explicitly validates the O(1) query optimization by asserting exactly one database call regardless of member count.


274-310: LGTM on fallback member helper tests.

Good coverage of edge cases: short user IDs, long user IDs (last 4 chars), and missing userId defaulting to "unknown".


199-235: Unable to verify: Test file and implementation not found in repository.

The test file backend/tests/groups/test_enrich_members.py and the _enrich_members_with_user_details implementation are not present in the repository. Without access to the actual code, the review comment's claims about implementation behavior and data loss cannot be verified.

backend/app/expenses/service.py (5)

967-981: LGTM on early exit for users with no groups.

Good defensive handling that returns a properly structured empty response when the user isn't in any groups.


993-1048: Well-designed aggregation pipeline.

The pipeline efficiently computes friend balances in a single query:

  1. Matches settlements in user's groups involving the user
  2. Groups by friend+group to compute per-group balances
  3. Regroups by friend for total balance with breakdown
  4. Filters out near-zero balances

The sign convention (positive = friend owes user, negative = user owes friend) is consistent throughout.


1079-1088: Good batch fetch with proper error handling.

The code correctly batch-fetches friend details and falls back to an empty map on failure, ensuring the method can still return partial results with "Unknown" user names.


1094-1096: Good defensive access pattern.

Using result.get("totalBalance", 0) addresses the previous review's KeyError concern and handles edge cases where the aggregation might not produce expected fields.


1145-1156: LGTM on the response structure.

The summary correctly aggregates totals and the response structure matches the documented return type. The rounding to 2 decimal places ensures clean financial values.

@Devasy Devasy merged commit a035ec0 into main Dec 7, 2025
38 checks passed
@Devasy Devasy deleted the feat/optimise-db-queries branch December 7, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants