[7/8] Add Python SDK app-server integration harness#22014
Open
aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
Open
[7/8] Add Python SDK app-server integration harness#22014aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
Conversation
Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful. Co-authored-by: Codex <noreply@openai.com>
This was referenced May 10, 2026
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists. Co-authored-by: Codex <noreply@openai.com>
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing. Co-authored-by: Codex <noreply@openai.com>
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals. Co-authored-by: Codex <noreply@openai.com>
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path. Co-authored-by: Codex <noreply@openai.com>
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server. Co-authored-by: Codex <noreply@openai.com>
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers. Co-authored-by: Codex <noreply@openai.com>
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload. Co-authored-by: Codex <noreply@openai.com>
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary. Co-authored-by: Codex <noreply@openai.com>
Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs. Co-authored-by: Codex <noreply@openai.com>
Rename the split Python SDK app-server integration files and helper module to concise group names. Co-authored-by: Codex <noreply@openai.com>
Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness. Co-authored-by: Codex <noreply@openai.com>
Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness. Co-authored-by: Codex <noreply@openai.com>
Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history. Co-authored-by: Codex <noreply@openai.com>
Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary. Co-authored-by: Codex <noreply@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together.
This PR adds deterministic integration coverage that starts the pinned
codex app-serverprocess and mocks only the upstream Responses HTTP boundary.What
AppServerHarnessandMockResponsesServerhelpers for isolatedCODEX_HOME, mock-provider config, queued SSE responses, and captured/v1/responsesrequests.Thread.run,TurnHandle.stream, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping.Stack
[1/8]Pin Python SDK runtime dependency[2/8]Generate Python SDK types from pinned runtime[3/8]Run Python SDK tests in CI[4/8]Define Python SDK public API surface[5/8]Rename Python SDK package toopenai-codex[6/8]Add high-level Python SDK approval mode[7/8]Add Python SDK app-server integration harness[8/8]Add Python SDK Ruff formattingVerification
sdk/python/tests/test_app_server_*.pyandtest_real_app_server_integration.py.