Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,7 @@
## 2026-05-20 - Joined Queries for Integrity Verification
**Learning:** Performing multiple sequential database queries to verify cryptographically chained records (e.g., fetching a record and then its associated token/metadata from another table) introduces unnecessary latency and increases database load.
**Action:** Consolidate associated data retrieval into a single SQL `JOIN` query within the verification hot-path. This reduces database round-trips and improves end-to-end latency for blockchain-style integrity checks.

## 2025-05-15 - Tokenizer Implementation Performance
**Learning:** Benchmarking different Python string tokenization strategies in `CivicRAG` showed that `re.compile(r'[^a-z0-9\s]').sub('', text.lower()).split()` is ~35% faster than `re.findall(r'[a-z0-9]+', text.lower())` for standard civic policy descriptions. The overhead of creating many small strings in `findall` exceeded the cost of a single `sub` and `split`.
**Action:** Always benchmark specific string processing alternatives in hot paths; the most intuitive "optimized" regex approach isn't always the fastest in Python's implementation.
7 changes: 0 additions & 7 deletions backend/rag_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ def _prepare_policies(self):
content = f"{title} {text}"
content_tokens = self._tokenize(content)

content_tokens = self._tokenize(content)

self._prepared_policies.append({
'title_tokens': self._tokenize(title),
'content_tokens': content_tokens,
Expand Down Expand Up @@ -84,7 +82,6 @@ def retrieve(self, query: str, threshold: float = 0.05) -> Optional[str]:
if not len_query:
return None

query_len = len(query_tokens)
best_score = 0.0
best_formatted = None

Expand All @@ -95,10 +92,6 @@ def retrieve(self, query: str, threshold: float = 0.05) -> Optional[str]:
if query_tokens.isdisjoint(policy_tokens):
continue

# Optimized: Early exit using isdisjoint which is faster than computing intersection
if query_tokens.isdisjoint(policy_tokens):
continue

# Jaccard Similarity
# Optimization 2: Calculate intersection
intersection_len = len(query_tokens.intersection(policy_tokens))
Expand Down
Loading