feat: prevent overfitting via prompt changes and post-processing by andrewklatzke · Pull Request #127 · launchdarkly/python-server-sdk-ai

andrewklatzke · 2026-04-07T20:02:27Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Describe the solution you've provided

Implements a few things to prevent "overfitting" of responses (the LLM tailoring its response to only one set of inputs and values):

"chaos" mode now runs an additional validation loop after a successful result is reached. The number of iterations is determined by the total dataset provided; a smaller dataset results in a smaller number of validation checks.
prompt updates with specific instructions about not overfitting to a single result
changed how variables are provided to the LLM; it was getting confused on the difference between the placeholders and the raw values that were being provided
adds a post-processing step to transform any raw values inserted into prompts back into their placeholder
adds a retry loop on failed variation generation (when the LLM responds with a 0 length input). Also mitigates this by changing the instructions regarding its tool call behavior and removing the structured output tool (relying on the LLM to return a valid JSON response).

Describe alternatives you've considered

This is an attempted fix at an overfitting problem that was reported.

Additional context

Output example after these changes:

You are the initial orchestrator for user questions regarding travel plans in a given location. Your role is to first fetch user preferences using the attached 'user-preferences-lookup' tool with the provided user ID: {{user_id}}. Based on the retrieved user preferences, including but not limited to the purpose of the trip (e.g., {{trip_purpose}}), accurately route the user's query to the correct sub-agent. Do not provide any direct answers yourself.\n\nRouting criteria:\n1. leisure-activity-agent: Handles questions about activities, events, or things to do in the area. For {{trip_purpose}} trips, only off-hour or leisure activities should be passed to this agent.\n2. lodging-agent: Manages inquiries about accommodations, including hotels, Airbnbs, or other lodging options.\n3. restaurant-agent: Handles questions about dining, restaurants, diets, or related topics.\n\nInstructions:\n- Always begin by fetching user preferences using the 'user-preferences-lookup' tool with the supplied user ID ({{user_id}}).\n- If user preferences are unavailable or cannot be fetched, your response should be an automatic failure with no further processing.\n- Utilize the fetched preferences to determine the trip purpose ({{trip_purpose}}) and any other relevant data.\n- Based on the user's question context and preferences, pass the entire user query along with relevant preference details to the appropriate sub-agent.\n- Explicitly mention the sub-agent you are handing off to in your response.\n- Do not answer the user's question directly.\n- If the user's input or preferences do not clearly map to any agent, respond with an automatic failure indicating missing or incomplete data.\n\nThis orchestration ensures that all queries are handled appropriately by the specialized sub-agents, maximizing relevance and user satisfaction.

The appropriate placeholders are now present rather than it hardcoding values directly into the prompt.

Note

Medium Risk
Medium risk because it changes the public callback contract (handle_agent_call/handle_judge_call now return OptimizationResponse) and alters optimization control flow by adding post-pass validation loops and retry behavior, which can affect integrations and run-time characteristics.

Overview
Adds a post-pass validation phase (“chaos mode”) that, after an initial passing iteration, reruns the agent on additional distinct sampled inputs/variables (2–5, scaled by pool size) before confirming success; failed validation rejects the candidate and continues with variation generation without consuming the attempt budget.

Refactors agent/judge callback plumbing to return a new OptimizationResponse (output + optional TokenUsage), records per-call durations, and persists generation_latency/token usage plus per-judge evaluation latencies/tokens in agent_optimization_result payloads.

Hardens variation generation and overfitting prevention by improving prompts (explicit placeholder key-vs-value guidance + overfitting warning section), broadening placeholder interpolation to support hyphenated keys, adding deterministic post-processing (restore_variable_placeholders) to revert leaked concrete values back to {{key}}, and retrying variation generation up to 3 times on empty/unparseable JSON while removing structured-output tool injection/handler routing.

^{Reviewed by Cursor Bugbot for commit 288336e. Bugbot is set up for automated code reviews on this repo. Configure here.}

packages/optimization/src/ldai_optimization/client.py

cursor · 2026-04-08T16:49:16Z

packages/optimization/src/ldai_optimization/client.py

+                    optimize_context, iteration
+                )
+                if all_valid:
+                    return self._handle_success(last_ctx, iteration)


Success returns validation context with inflated iteration number

Medium Severity

When validation passes, _handle_success receives last_ctx — the last validation sample's context — instead of optimize_context (the original passing turn). The validation context's .iteration is set to iteration + i + 1 (a synthetic validation-internal number), so the returned result and the "success" status update carry an inflated iteration number. For example, if the main loop passes on attempt 1 with 2 validation samples, the result reports iteration=3 instead of 1. This propagates into on_passing_result, on_status_update, and the API persistence layer via _persist_and_forward, causing misleading iteration counts in the UI and stored records.

Additional Locations (1)

packages/optimization/src/ldai_optimization/client.py#L1619-L1622

^{Reviewed by Cursor Bugbot for commit 3042984. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 288336e. Configure here.}

cursor · 2026-04-08T20:13:51Z

packages/optimization/src/ldai_optimization/util.py

+                f"— replaced {total_count} occurrence(s) with placeholder {placeholder}"
+            )
+
+    return text, warnings


Sync callbacks crash in await_if_needed after return type change

High Severity

await_if_needed checks isinstance(result, str) to detect synchronous returns, but the callback signatures now return OptimizationResponse instead of str. When a synchronous (non-async) callback returns an OptimizationResponse, the isinstance check is False, so the code falls through to await result, which raises a TypeError because OptimizationResponse is not awaitable. All tests use AsyncMock so this path is untested.

Additional Locations (1)

packages/optimization/src/ldai_optimization/dataclasses.py#L246-L254

^{Reviewed by Cursor Bugbot for commit 288336e. Configure here.}

andrewklatzke added 2 commits April 7, 2026 11:52

feat: prevent overfitting via prompt changes and post-processing

8f9f1e2

chore: remove some dead code

a17fd6e

andrewklatzke requested a review from a team as a code owner April 7, 2026 20:02

chore: remove provided_tool_handlers code

67fdbf1

cursor bot reviewed Apr 7, 2026

View reviewed changes

packages/optimization/src/ldai_optimization/client.py Show resolved Hide resolved

andrewklatzke requested a review from jsonbailey April 8, 2026 16:34

fix: adjust iteration logic so validation doesn't consume them

3042984

cursor bot reviewed Apr 8, 2026

View reviewed changes

jsonbailey approved these changes Apr 8, 2026

View reviewed changes

feat: implement latency & token tracking for optimizations

288336e

cursor bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: prevent overfitting via prompt changes and post-processing#127

feat: prevent overfitting via prompt changes and post-processing#127
andrewklatzke wants to merge 5 commits intoaklatzke/AIC-1795/optimize-method-ground-truth-pathfrom
aklatzke/AIC-2118/add-additional-validation-to-chaos-mode

andrewklatzke commented Apr 7, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot Apr 8, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrewklatzke commented Apr 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot Apr 8, 2026

Choose a reason for hiding this comment

Success returns validation context with inflated iteration number

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 8, 2026

Choose a reason for hiding this comment

Sync callbacks crash in await_if_needed after return type change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewklatzke commented Apr 7, 2026 •

edited by cursor bot

Loading

Sync callbacks crash in `await_if_needed` after return type change