Skip to content

Commit f88903b

Browse files
Llm service integration (buerokratt#329)
* remove unwanted file * updated changes * fixed requested changes * fixed issue * service workflow implementation without calling service endpoints * fixed requested changes * fixed issues * protocol related requested changes * fixed requested changes * update time tracking * added time tracking and reloacate input guardrail before toolclassifiier * fixed issue * fixed issue * added hybrid search for the service detection * update tool classifier * fixing merge conflicts * fixed issue * optimize first user query response generation time * fixed pr reviewed issues * service integration * context based response generation flow * fixed pr review suggested issues * removed service project layer * fixed issues * delete unnessary files * added requested changes * fixed issue --------- Co-authored-by: Thiru Dinesh <56014038+Thirunayan22@users.noreply.github.com>
1 parent 4662c82 commit f88903b

11 files changed

Lines changed: 686 additions & 362 deletions
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[
2+
{{#each data.botMessages}}
3+
{
4+
"chatId": "{{../data.chatId}}",
5+
"content": "{{filterControlCharacters result}}",
6+
"buttons": "[{{#each ../data.buttons}}{\"title\": \"{{#if (eq title true)}}Yes{{else if (eq title false)}}No{{else}}{{{title}}}{{/if}}\",\"payload\": \"{{{payload}}}\"}{{#unless @last}},{{/unless}}{{/each}}]",
7+
"authorTimestamp": "{{../data.authorTimestamp}}",
8+
"authorId": "{{../data.authorId}}",
9+
"authorFirstName": "{{../data.authorFirstName}}",
10+
"authorLastName": "{{../data.authorLastName}}",
11+
"created": "{{../data.created}}"
12+
}{{#unless @last}},{{/unless}}
13+
{{/each}}
14+
]

DSL/DMapper/rag-search/lib/helpers.js

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,11 @@ export function getAgencyDataAvailable(agencyId) {
168168
return (combinedValue % 2) === 0;
169169
}
170170

171+
export function filterControlCharacters(str) {
172+
if (typeof str !== "string") return str;
173+
return str.replace(/[\x00-\x1F\x7F]/g, " ");
174+
}
175+
171176
export function json(context) {
172177
return JSON.stringify(context);
173178
}
@@ -269,3 +274,27 @@ export function filterDataByAgency(aggregatedData, startIndex, agencyId, pageSiz
269274
return JSON.stringify(result);
270275

271276
}
277+
278+
export function calculateDateDifference(value) {
279+
const { startDate, endDate, outputType } = value;
280+
const sDate = new Date(startDate);
281+
const eDate = new Date(endDate);
282+
const timeDifferenceInSeconds = (eDate.getTime() - sDate.getTime()) / 1000;
283+
284+
switch (outputType?.toLowerCase()) {
285+
case 'years':
286+
return eDate.getFullYear() - sDate.getFullYear();
287+
case 'months':
288+
return eDate.getMonth() - sDate.getMonth() +
289+
(12 * (eDate.getFullYear() - sDate.getFullYear()))
290+
case 'hours':
291+
return Math.round(Math.abs(eDate - sDate) / 36e5);
292+
case 'minutes':
293+
return Math.floor(timeDifferenceInSeconds / 60);
294+
case 'seconds':
295+
return timeDifferenceInSeconds;
296+
default:
297+
return Math.round(timeDifferenceInSeconds / (3600 * 24));
298+
}
299+
}
300+

constants.ini

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,7 @@ RAG_SEARCH_CRON_MANAGER=http://cron-manager:9010
99
RAG_SEARCH_LLM_ORCHESTRATOR=http://llm-orchestration-service:8100/orchestrate
1010
RAG_SEARCH_PROMPT_REFRESH=http://llm-orchestration-service:8100/prompt-config/refresh
1111
DOMAIN=localhost
12-
DB_PASSWORD=dbadmin
12+
DB_PASSWORD=dbadmin
13+
RAG_SEARCH_RUUTER_PUBLIC_INTERNAL_SERVICE=http://ruuter-public:8086/services
14+
SERVICE_DMAPPER_HBS=http://data-mapper:3000/hbs/rag-search
15+
SERVICE_PROJECT_LAYER=services

docs/HYBRID_SEARCH_CLASSIFICATION.md

Lines changed: 44 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,8 @@ The system has two phases:
5353
| `src/intent_data_enrichment/main_enrichment.py` | Orchestrates per-example and summary point creation |
5454
| `src/intent_data_enrichment/qdrant_manager.py` | Qdrant collection management, upsert, and deletion |
5555
| `src/intent_data_enrichment/api_client.py` | LLM API calls (context generation, embeddings) |
56-
| `src/intent_data_enrichment/models.py` | `EnrichedService` data model |
56+
| `src/intent_data_enrichment/models.py` | `ServiceData`, `EnrichedService`, `EnrichmentResult` data models |
57+
| `src/intent_data_enrichment/constants.py` | `EnrichmentConstants` — API URLs, Qdrant config, vector sizes, LLM prompt template |
5758
| `src/tool_classifier/sparse_encoder.py` | BM25-style sparse vector computation |
5859

5960
### What Changed: Single Embedding → Per-Example Indexing
@@ -78,8 +79,8 @@ Service "Valuutakursid" → 4 Qdrant points
7879
dense: 3072-dim embedding of this exact text
7980
sparse: BM25 vector → {euro: 1.0, gbp: 1.0, kurss: 1.0, ...}
8081
81-
Point 3 (summary): "Valuutakursid - Kasutaja soovib infot..."
82-
dense: 3072-dim embedding of name + description + LLM context
82+
Point 3 (summary): "Service Name: Valuutakursid\nDescription: ...\nExample Queries: ...\nRequired Entities: ...\nEnriched Context: ..."
83+
dense: 3072-dim embedding of combined text
8384
sparse: BM25 vector of combined text
8485
```
8586

@@ -101,9 +102,12 @@ Service "Valuutakursid" → 4 Qdrant points
101102

102103
```python
103104
# sparse_encoder.py
105+
SPARSE_VOCAB_SIZE = 50_000
106+
104107
text = "Mis suhe on euro ja usd vahel"
105108
tokens = re.findall(r"\w+", text.lower()) # ["mis", "suhe", "on", "euro", ...]
106-
# Each token → hashed to index in [0, VOCAB_SIZE), value = term frequency
109+
# Each token → MD5 hash (first 4 bytes) to index in [0, SPARSE_VOCAB_SIZE), value = term frequency
110+
# Collisions are handled by summing values at the same index
107111
# Output: SparseVector(indices=[hash("mis"), hash("euro"), ...], values=[1.0, 1.0, ...])
108112
```
109113

@@ -146,7 +150,7 @@ service_enrichment.sh
146150
│ ├─ Generate dense embedding (text-embedding-3-large)
147151
│ └─ Generate sparse vector (BM25 term hashing)
148152
149-
├─ Step 3: Summary point (name + description + LLM context):
153+
├─ Step 3: Summary point (name + description + examples + entities + LLM context):
150154
│ ├─ Generate dense embedding
151155
│ └─ Generate sparse vector
152156
@@ -155,6 +159,17 @@ service_enrichment.sh
155159
└─ Step 5: Bulk upsert N+1 points to Qdrant
156160
```
157161

162+
### Summary Point Combined Text Format
163+
164+
The summary point embeds a structured concatenation:
165+
```
166+
Service Name: {name}
167+
Description: {description}
168+
Example Queries: {example1} | {example2} | ...
169+
Required Entities: {entity1}, {entity2}, ...
170+
Enriched Context: {LLM-generated context}
171+
```
172+
158173
### Service Deletion
159174

160175
When a service is deactivated, all its points are removed:
@@ -186,12 +201,12 @@ POST /collections/intent_collections/points/query
186201
{
187202
"query": [0.023, -0.041, ...], # 3072-dim dense vector
188203
"using": "dense",
189-
"limit": 6,
204+
"limit": 6, # DENSE_SEARCH_TOP_K * 2 (3 * 2 = 6, allows dedup)
190205
"with_payload": true
191206
}
192207
```
193208

194-
Results are deduplicated by `service_id` (best score per service).
209+
Results are deduplicated by `service_id` (best score per service), returning up to `DENSE_SEARCH_TOP_K` (3) unique services.
195210

196211
**Why not use RRF scores?**
197212
Qdrant's RRF uses `1/(1+rank)`, producing fixed scores (0.50, 0.33, 0.25) regardless of actual relevance. A perfect match and a random query both get 0.50 for rank 1. Cosine similarity reflects true semantic closeness.
@@ -203,6 +218,7 @@ Sparse prefetch is only included if the query produces a non-empty sparse vector
203218

204219
```python
205220
# classifier.py → _hybrid_search()
221+
# First checks collection exists and has data (points_count > 0)
206222
POST /collections/intent_collections/points/query
207223
{
208224
"prefetch": [
@@ -215,6 +231,10 @@ POST /collections/intent_collections/points/query
215231
}
216232
```
217233

234+
> **Note:** Prefetch limit is `HYBRID_SEARCH_TOP_K * 2` (5 * 2 = 10). The sparse prefetch is conditionally added only when `sparse_vector.is_empty()` is False.
235+
236+
Hybrid results are also deduplicated by `service_id` (best RRF score per service).
237+
218238
### Routing Decision
219239

220240
```
@@ -251,6 +271,7 @@ Dense: Valuutakursid (cosine=0.5511), gap=0.2371
251271
→ Runs intent detection + entity extraction on matched service only
252272
→ Entities: {currency_from: EUR, currency_to: THB}
253273
→ Validation: PASSED ✓
274+
→ Calls service endpoint → Returns response
254275
```
255276

256277
### Path 3: AMBIGUOUS Service Match → LLM Confirmation
@@ -285,17 +306,17 @@ SERVICE (Layer 1) → CONTEXT (Layer 2) → RAG (Layer 3) → OOD (Layer 4
285306
| Path | Intent Detection | Entity Extraction |
286307
|------|-----------------|-------------------|
287308
| HIGH-CONFIDENCE | On 1 service (matched) | Yes — from LLM output |
288-
| AMBIGUOUS | On 2-3 candidates | Yes — if LLM matches |
309+
| AMBIGUOUS | On top candidates (from `top_results`) | Yes — if LLM matches |
289310
| Non-service | Not run | Not run |
290311

291312
### Intent Detection Module (DSPy)
292313

293314
**File:** `src/tool_classifier/intent_detector.py`
294315

295-
The DSPy `IntentDetectionModule` receives:
316+
The DSPy `IntentDetectionModule` uses `dspy.Predict` (direct prediction) and receives:
296317
- User query
297-
- Candidate services (formatted as JSON)
298-
- Conversation history (last 3 turns)
318+
- Candidate services (formatted as JSON with service_id, name, description, required_entities, top 3 examples)
319+
- Conversation history (last 3 turns, formatted as `{authorRole}: {message}`)
299320

300321
It returns:
301322
```json
@@ -336,6 +357,18 @@ Entities dict → ordered array matching service schema:
336357
# Array: ["EUR", "THB"]
337358
```
338359

360+
### Service Endpoint Call
361+
362+
After entity validation and transformation, the workflow calls the Ruuter active service endpoint:
363+
364+
```python
365+
# Endpoint: {RUUTER_SERVICE_BASE_URL}/services/active/{clean_service_name}
366+
# Payload: {"chatId": "...", "authorId": "...", "input": ["EUR", "THB"]}
367+
# Response: {"response": [{"content": "..."}]} → extracts content string
368+
```
369+
370+
In streaming mode, the service content is wrapped as SSE events and streamed to the client.
371+
339372
---
340373

341374
## Thresholds & Configuration
@@ -387,7 +420,3 @@ Based on empirical testing with 42 Estonian queries (20 SERVICE, 22 RAG):
387420
- **Adding more services:** Score distributions improve naturally — service queries score higher, non-service score lower.
388421
- **Adding more examples per service:** Diverse phrasings expand the embedding coverage. Aim for 5-8 examples per service covering formal + informal + different word orders.
389422
- **Adjusting thresholds:** Monitor the logs (`Dense search: top=... cosine=...`) and adjust if real-world scores differ from test data.
390-
391-
### Current Limitations
392-
393-
- **Step 7 (Ruuter service call) is not yet implemented.** The service workflow currently returns a debug response with service metadata (endpoint URL, HTTP method, extracted entities) instead of calling the actual Ruuter service endpoint. See the `TODO: STEP 7` comments in `src/tool_classifier/workflows/service_workflow.py`.

0 commit comments

Comments
 (0)