Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 139 additions & 35 deletions docs/API_TOOL_CALLING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ loop collects all required parameters from the user before the API call is made.
| **Indexing pipeline** | Takes an endpoint definition → enriches it with LLM context → stores hybrid vectors in Qdrant | ✅ Complete |
| **Tool classifier** | At query time, routes to the best matching endpoint via hybrid search + LLM disambiguation | ✅ Complete |
| **Agentic loop** | Multi-turn parameter collection with session persistence, language-aware clarifying questions, param correction, continuation prompt, and intent-switch detection | ✅ Complete |
| **API caller** | Execute the collected params against the real API endpoint and format the response | 🔧 Planned (next task) |
| **API caller** | Execute collected params against the real API endpoint, with circuit-breaker protection and localized error handling | ✅ Complete |
| **Response formatter** | Convert raw API JSON into a natural-language answer via DSPy, streamed token-by-token to the GUI | ✅ Complete |

---

Expand All @@ -41,6 +42,12 @@ APIToolWorkflowExecutor (src/tool_classifier/workflows/api_tool_workflow.py)
AgenticLoop (src/tool_classifier/agentic_loop.py)
↓ session state
APIToolSessionStore (Redis, keyed by chat_id, 30-min TTL)
↓ all params collected
APICaller (src/tool_classifier/api_caller.py)
↓ raw JSON response
APIResponseFormatterModule (src/tool_classifier/api_response_formatter.py)
↓ SSE token stream
User (GUI)
```

---
Expand Down Expand Up @@ -241,10 +248,12 @@ loop can execute the API call without an additional database round-trip.
| Field | Type | Description |
|---|---|---|
| `name` | str | Parameter name |
| `type` | str | `string`, `date`, `integer`, `boolean`, `number` |
| `type` | str | `string`, `date`, `datetime`, `integer`, `boolean`, `number` |
| `required` | bool | Whether the caller must supply this param |
| `description` | str | Human-readable description |

> **`datetime` type:** normalised to `YYYY-MM-DDTHH:MM:SSZ` by `ParamExtractionModule._validate_param_type()`. Useful for APIs that require ISO 8601 datetime strings (e.g. electricity price endpoints).

---


Expand Down Expand Up @@ -391,29 +400,31 @@ Handles `WorkflowType.API_TOOL_CALLING` after `ToolClassifier.classify()` has se
- **Turn 1 (new session):** reads `context["matched_endpoint"]`, creates a new
`APIToolSession` in Redis, runs the first agentic loop turn.
- **Turn 2-N (resume):** loads the existing session from Redis, runs the next turn.
- **Fast path:** if the endpoint has no required params, immediately returns the
completed JSON without starting a session.
- **Completion:** when all params are collected, deletes the session and returns a
JSON response with `status=params_collected`.
- **Fast path:** if the endpoint has no required params, immediately calls the API
without starting a session.
- **Clarifying question:** when params are still missing, streams the LLM-generated
question token-by-token via SSE. Each token is one `format_sse` frame; the stream
ends with an `END` frame.
- **API call:** when all params are collected, calls `APICaller.call()` then streams
the natural-language answer from `APIResponseFormatterModule.stream_forward()`
token-by-token via SSE.
- **Max turns:** deletes the session and returns `None` to trigger RAG fallback.
- **Streaming:** wraps the short clarifying-question response in a single SSE frame
+ `END` marker.

**Completed response format:**
**Streaming architecture:**

Both clarifying questions and final responses are streamed token-by-token.
`_compute_loop_step()` is the single source of truth — it returns a `_LoopStep`
tagged as `"question"`, `"api_call"`, or `"fallback"`. `execute_streaming()` then
handles each case:

```json
{
"status": "params_collected",
"endpoint": { "name": "get_public_holidays" },
"collected_params": {
"countryIsoCode": "EE",
"validFrom": "2026-01-01",
"validTo": "2026-12-31"
}
}
```
"question" → iterate step.question_tokens (real DSPy tokens)
→ yield format_sse(chat_id, token) per token → yield END

The actual API call and response formatting are handled by the next planned task.
"api_call" → APICaller.call() [blocking HTTP]
→ async for token in APIResponseFormatterModule.stream_forward()
→ yield format_sse(chat_id, token) per token → yield END
```

---

Expand Down Expand Up @@ -443,6 +454,7 @@ Stored in Redis keyed by `chat_id` with a **30-minute sliding TTL**.
| `max_turns` | int | Max turns before fallback (default: 5) |
| `awaiting_continuation` | bool | True when continuation prompt has been shown |
| `detected_language` | str | Language from first message (`en`, `et`, `ru`) — persisted so all clarifying questions use the same language |
| `original_query` | str | The user’s first message that triggered the session — preserved across turns so the response formatter always receives the full original intent, not just the last short follow-up (e.g. `"from 2026-04-01 to 2026-04-30"`) |

### Turn Flow

Expand Down Expand Up @@ -505,11 +517,6 @@ Localized continuation questions are defined in
[src/tool_classifier/constants.py](../src/tool_classifier/constants.py):
`CONTINUATION_QUESTION`, `CONTINUATION_QUESTION_ET`, `CONTINUATION_QUESTION_RU`.

**History isolation:**
On turn 0 (first turn of a new session), `conversation_history=[]` is passed to the
extractor regardless of what the API sends. This prevents parameter values from a
previous completed session from being re-used for the new request.

**Constants** (in `src/tool_classifier/constants.py`):

| Constant | Value | Description |
Expand All @@ -518,7 +525,94 @@ previous completed session from being re-used for the new request.

---

## Part 4 — Session Management & Intent Switch Detection
## Part 4 — API Caller & Response Formatter

### Component: `APICaller`

Defined in [src/tool_classifier/api_caller.py](../src/tool_classifier/api_caller.py).

Executes the external HTTP request once all required parameters have been collected
by the agentic loop.

**Supported methods:** `GET` (params → query string) and `POST` (params → JSON body).

**Timeout:** `API_CALL_TIMEOUT` seconds (from `constants.py`). Overridable per-call.

**Return type:** `APICallResult`

| Field | Type | Description |
|---|---|---|
| `success` | bool | `True` for 2xx responses |
| `status_code` | int | HTTP status code; `0` for network/timeout/circuit-breaker failures |
| `response_data` | Any | Parsed JSON on success; raw parsed error body on 4xx; empty string on all other failures |
| `error` | str \| None | Localized user-facing error message on failure; `None` on success |

**Error handling:**

| Failure type | `status_code` | `response_data` | `error` field |
|---|---|---|---|
| 4xx (client error, e.g. bad params) | actual code | Raw parsed body (preserved for agentic loop re-prompting) | Localized `CLIENT_ERROR_MESSAGES` |
| 5xx (server error) | actual code | `""` | Localized `SERVICE_UNAVAILABLE_MESSAGES` |
| Timeout | `0` | `""` | Localized `SERVICE_TIMEOUT_MESSAGES` |
| Network error | `0` | `""` | Localized `SERVICE_TIMEOUT_MESSAGES` |
| Redirect not followed | `3xx` | `""` | Localized `REDIRECT_NOT_FOLLOWED_MESSAGES` |
| Circuit breaker open | `0` | `""` | Localized `CIRCUIT_BREAKER_OPEN_MESSAGES` |

4xx responses do **not** trip the circuit breaker — they indicate bad input, not a
server outage. The agentic loop can re-prompt the user for corrected values.

**Language-aware errors:** all error messages are localized using `session.detected_language`
(`et`, `en`, `ru`). The message constants are defined in
[src/tool_classifier/constants.py](../src/tool_classifier/constants.py).

---

### Component: `CircuitBreaker`

Part of `api_caller.py`. One breaker instance per URL, shared across requests for the
lifetime of the `APICaller` instance.

```
CLOSED → OPEN: after CIRCUIT_BREAKER_FAILURE_THRESHOLD consecutive server/network failures
OPEN → HALF_OPEN: after CIRCUIT_BREAKER_COOLDOWN_SECONDS
HALF_OPEN → CLOSED: on first successful probe call
HALF_OPEN → OPEN: on first failed probe call
```

When OPEN, `call()` returns immediately without making an HTTP request.

**Constants** (in `src/tool_classifier/constants.py`):

| Constant | Description |
|---|---|
| `CIRCUIT_BREAKER_FAILURE_THRESHOLD` | Consecutive failures before opening |
| `CIRCUIT_BREAKER_COOLDOWN_SECONDS` | Seconds to wait before probing |

---

### Component: `APIResponseFormatterModule`

Defined in [src/tool_classifier/api_response_formatter.py](../src/tool_classifier/api_response_formatter.py).

Converts the raw API JSON response into a natural-language answer using DSPy.
Supports both blocking (`forward`) and streaming (`stream_forward`) execution.

**DSPy Signature:** `APIResponseFormatterSignature`

| Input field | Description |
|---|---|
| `user_query` | The user's original question |
| `api_response` | Raw API JSON as a string (truncated to `_MAX_RESPONSE_BYTES` = 50 KB) |
| `endpoint_description` | Short description of what the endpoint does |
| `response_language` | `"English"`, `"Estonian"`, or `"Russian"` — derived from `detected_language` |

| Output field | Description |
|---|---|
| `formatted_answer` | Clean natural-language answer, no raw JSON or markdown headers |



## Part 5 — Session Management & Intent Switch Detection

### `APIToolSessionStore`

Expand Down Expand Up @@ -600,7 +694,7 @@ ToolClassifier.classify()
APIToolWorkflowExecutor._run()
├─ No existing session → create new APIToolSession (turn_count=0, language=en)
├─ No existing session → create new APIToolSession (turn_count=0, language=en, original_query="What are the public holidays in Estonia?")
└─ AgenticLoop.run_turn(turn_count=0, history=[])
├─ ParamExtractionModule: no params in "What are the public holidays in Estonia?"
│ but countryIsoCode=EE can be inferred → extracted
Expand Down Expand Up @@ -631,12 +725,22 @@ APIToolWorkflowExecutor._run()
Session DELETED from Redis
Bot: {"status": "params_collected", "endpoint": {"name": "get_public_holidays"}, "collected_params": {"countryIsoCode": "EE", "validFrom": "2026-01-01", "validTo": "2026-12-31"}}
APIToolWorkflowExecutor._stream_api_and_format()
├─ user_query = session.original_query → "What are the public holidays in Estonia?"
├─ APICaller.call(GET https://openholidaysapi.org/PublicHolidays, params={countryIsoCode,validFrom,validTo})
│ → status=200, response_data=[{"name": "New Year's Day", ...}, ...]
└─ APIResponseFormatterModule.stream_forward(user_query, api_response, description, language="en")
→ DSPy StreamResponse tokens yielded one by one
→ format_sse(chat_id, "Here are the public holidays ") ...
→ format_sse(chat_id, "END")
Bot: "Here are the public holidays in Estonia for 2026:\n- New Year's Day (1 Jan)\n- ..." ← streamed token-by-token
```

---

## Part 5 — Integration Testing
## Part 6 — Integration Testing

### Test Script

Expand All @@ -655,12 +759,12 @@ uv run --no-project --with requests python tests/api_tool_eval/integration_test_

| # | Scenario | Turns | What it validates |
|---|---|---|---|
| 1 | Single-turn complete | 1 | Vehicle tax with plate number in first message → immediate completion |
| 2 | Multi-turn EN | 2 | Public holidays, country extracted turn 1, dates provided turn 2 |
| 3 | Multi-turn ET | 2 | School holidays in Estonian → language-aware classification |
| 4 | No-params fast path | 1 | Parliament votings endpoint has no required params → instant completion |
| 1 | Single-turn complete | 1 | Vehicle tax with plate number in first message → immediate API call + formatted response |
| 2 | Multi-turn EN | 2 | Public holidays, country extracted turn 1, dates provided turn 2 → API call + formatted response |
| 3 | Multi-turn ET | 2 | School holidays in Estonian → language-aware classification + Estonian response |
| 4 | No-params fast path | 1 | Parliament votings endpoint has no required params → immediate API call without session |
| 5 | Address search | 2 | Two-turn address lookup |
| 6 | Electricity prices | 2 | Datetime params across two turns |
| 6 | Electricity prices | 2 | `datetime` params across two turns |
| 7 | Session isolation | 2 | Two different chat IDs — no param leak between sessions |
| 8 | AWAITING_CONTINUATION → yes | 4+ | User says "yes" at continuation prompt → loop resumes |
| 8 | AWAITING_CONTINUATION → yes | 4+ | User says yes at continuation prompt → loop resumes → API call on completion |
| 9 | MAX_TURNS_REACHED | 5+ | User never provides params → falls back to RAG |
8 changes: 8 additions & 0 deletions src/models/session_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,11 @@ class APIToolSession(BaseModel):
"even when follow-up messages are too short to reliably re-detect."
),
)
original_query: str = Field(
default="",
description=(
"The user's first message that triggered this session. "
"Preserved across turns so the response formatter always receives the "
"full original intent, not just the last short follow-up message."
),
)
Loading
Loading