Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,7 @@
## 2026-05-20 - Joined Queries for Integrity Verification
**Learning:** Performing multiple sequential database queries to verify cryptographically chained records (e.g., fetching a record and then its associated token/metadata from another table) introduces unnecessary latency and increases database load.
**Action:** Consolidate associated data retrieval into a single SQL `JOIN` query within the verification hot-path. This reduces database round-trips and improves end-to-end latency for blockchain-style integrity checks.

## 2025-05-22 - Consolidated Aggregate Queries
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor | ⚑ Quick win

Incorrect year in the date header.

The date shows "2025-05-22" but should be "2026-05-22" (or a later date in 2026) to align with the PR timeline and chronological order of entries.

πŸ“… Proposed fix
-## 2025-05-22 - Consolidated Aggregate Queries
+## 2026-05-22 - Consolidated Aggregate Queries
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## 2025-05-22 - Consolidated Aggregate Queries
## 2026-05-22 - Consolidated Aggregate Queries
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.jules/bolt.md at line 97, Update the date in the markdown header "##
2025-05-22 - Consolidated Aggregate Queries" to the correct year (e.g., change
2025-05-22 to 2026-05-22 or a later 2026 date) so the entry aligns with the PR
timeline and chronological order of entries.

**Learning:** Executing multiple separate aggregate queries (e.g., `count(distinct)`, `avg`, and `group_by` counts) for the same entity causes multiple database round-trips and redundant table scans.
**Action:** Consolidate multiple aggregates into a single SQLAlchemy query using `func.sum(case(...))` for categorical counts alongside other aggregates. This reduces network overhead and database load significantly in high-traffic statistics endpoints.
Comment on lines +97 to +99
7 changes: 1 addition & 6 deletions backend/rag_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ def _prepare_policies(self):
content = f"{title} {text}"
content_tokens = self._tokenize(content)

content_tokens = self._tokenize(content)

self._prepared_policies.append({
'title_tokens': self._tokenize(title),
'content_tokens': content_tokens,
Expand Down Expand Up @@ -92,10 +90,7 @@ def retrieve(self, query: str, threshold: float = 0.05) -> Optional[str]:
policy_tokens = prepared['content_tokens']

# Optimization 1: Fast early-exit for zero overlap
if query_tokens.isdisjoint(policy_tokens):
continue

# Optimized: Early exit using isdisjoint which is faster than computing intersection
# isdisjoint() is O(K) where K is min(len(query), len(policy))
if query_tokens.isdisjoint(policy_tokens):
continue

Expand Down
43 changes: 14 additions & 29 deletions backend/routers/field_officer.py
Original file line number Diff line number Diff line change
Expand Up @@ -437,38 +437,23 @@ def get_visit_statistics(db: Session = Depends(get_db)):
if cached_json:
return Response(content=cached_json, media_type="application/json")

# Optimized: Use a single aggregate query to fetch multiple statistics in one database roundtrip
agg_stats = db.query(
# Optimized: Use a single aggregate query to fetch all statistics in one database roundtrip
# This reduces database overhead and network latency by avoiding multiple roundtrips and table scans.
stats = db.query(
func.count(FieldOfficerVisit.id).label('total_visits'),
func.count(func.distinct(FieldOfficerVisit.officer_email)).label('unique_officers'),
func.avg(FieldOfficerVisit.distance_from_site).label('avg_distance')
func.avg(FieldOfficerVisit.distance_from_site).label('avg_distance'),
func.sum(case([(FieldOfficerVisit.verified_at.isnot(None), 1)], else_=0)).label('verified_visits'),
func.sum(case([(FieldOfficerVisit.within_geofence == True, 1)], else_=0)).label('within_geofence_count'),
func.sum(case([(FieldOfficerVisit.within_geofence == False, 1)], else_=0)).label('outside_geofence_count')
Comment on lines +446 to +448
Comment on lines +446 to +448
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: case() call style wrong for SQLAlchemy 2.x. This can throw ArgumentError and break visit-stats endpoint. Pass each WHEN as positional tuple.

Prompt for AI agents
Check if this issue is valid β€” if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 446:

<comment>`case()` call style wrong for SQLAlchemy 2.x. This can throw `ArgumentError` and break visit-stats endpoint. Pass each WHEN as positional tuple.</comment>

<file context>
@@ -437,38 +437,23 @@ def get_visit_statistics(db: Session = Depends(get_db)):
             func.count(func.distinct(FieldOfficerVisit.officer_email)).label('unique_officers'),
-            func.avg(FieldOfficerVisit.distance_from_site).label('avg_distance')
+            func.avg(FieldOfficerVisit.distance_from_site).label('avg_distance'),
+            func.sum(case([(FieldOfficerVisit.verified_at.isnot(None), 1)], else_=0)).label('verified_visits'),
+            func.sum(case([(FieldOfficerVisit.within_geofence == True, 1)], else_=0)).label('within_geofence_count'),
+            func.sum(case([(FieldOfficerVisit.within_geofence == False, 1)], else_=0)).label('outside_geofence_count')
</file context>
Suggested change
func.sum(case([(FieldOfficerVisit.verified_at.isnot(None), 1)], else_=0)).label('verified_visits'),
func.sum(case([(FieldOfficerVisit.within_geofence == True, 1)], else_=0)).label('within_geofence_count'),
func.sum(case([(FieldOfficerVisit.within_geofence == False, 1)], else_=0)).label('outside_geofence_count')
func.sum(case((FieldOfficerVisit.verified_at.isnot(None), 1), else_=0)).label('verified_visits'),
func.sum(case((FieldOfficerVisit.within_geofence == True, 1), else_=0)).label('within_geofence_count'),
func.sum(case((FieldOfficerVisit.within_geofence == False, 1), else_=0)).label('outside_geofence_count')

).first()

counts = db.query(
FieldOfficerVisit.verified_at.isnot(None).label("is_verified"),
FieldOfficerVisit.within_geofence,
func.count(FieldOfficerVisit.id)
).group_by(
FieldOfficerVisit.verified_at.isnot(None),
FieldOfficerVisit.within_geofence
).all()

total_visits = 0
verified_visits = 0
within_geofence_count = 0
outside_geofence_count = 0

for is_verified, within_geofence, count in counts:
c = count or 0
total_visits += c
if is_verified:
verified_visits += c
if within_geofence is True:
within_geofence_count += c
elif within_geofence is False:
outside_geofence_count += c

unique_officers = agg_stats.unique_officers or 0
average_distance = agg_stats.avg_distance
total_visits = stats.total_visits or 0
verified_visits = int(stats.verified_visits or 0)
within_geofence_count = int(stats.within_geofence_count or 0)
outside_geofence_count = int(stats.outside_geofence_count or 0)
unique_officers = stats.unique_officers or 0
average_distance = stats.avg_distance

# Round to 2 decimals if not None
if average_distance is not None:
Expand Down
Loading