Skip to content

[7/8] Add Python SDK app-server integration harness#22014

Open
aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
codex/python-sdk-mock-integration-tests
Open

[7/8] Add Python SDK app-server integration harness#22014
aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
codex/python-sdk-mock-integration-tests

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented May 10, 2026

Why

The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together.

This PR adds deterministic integration coverage that starts the pinned codex app-server process and mocks only the upstream Responses HTTP boundary.

What

  • Add AppServerHarness and MockResponsesServer helpers for isolated CODEX_HOME, mock-provider config, queued SSE responses, and captured /v1/responses requests.
  • Add shared helpers for SSE construction, stream assertions, approval-policy inspection, and image fixtures.
  • Split integration coverage into focused modules for run behavior, inputs, streaming, turn controls, approvals, and thread lifecycle.
  • Cover sync and async Thread.run, TurnHandle.stream, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping.
  • Replace public-wrapper tests that duplicated integration-test behavior with lower-level client tests only where direct client behavior is the thing under test.

Stack

  1. [1/8] Pin Python SDK runtime dependency #21891 [1/8] Pin Python SDK runtime dependency
  2. [2/8] Generate Python SDK types from pinned runtime #21893 [2/8] Generate Python SDK types from pinned runtime
  3. [3/8] Run Python SDK tests in CI #21895 [3/8] Run Python SDK tests in CI
  4. [4/8] Define Python SDK public API surface #21896 [4/8] Define Python SDK public API surface
  5. [5/8] Rename Python SDK package to openai-codex #21905 [5/8] Rename Python SDK package to openai-codex
  6. [6/8] Add high-level Python SDK approval mode #21910 [6/8] Add high-level Python SDK approval mode
  7. This PR [7/8] Add Python SDK app-server integration harness
  8. [8/8] Add Python SDK Ruff formatting #22021 [8/8] Add Python SDK Ruff formatting

Verification

  • Added pinned app-server integration tests under sdk/python/tests/test_app_server_*.py and test_real_app_server_integration.py.

Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful.

Co-authored-by: Codex <noreply@openai.com>
aibrahim-oai and others added 14 commits May 10, 2026 13:37
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists.

Co-authored-by: Codex <noreply@openai.com>
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing.

Co-authored-by: Codex <noreply@openai.com>
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals.

Co-authored-by: Codex <noreply@openai.com>
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path.

Co-authored-by: Codex <noreply@openai.com>
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server.

Co-authored-by: Codex <noreply@openai.com>
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers.

Co-authored-by: Codex <noreply@openai.com>
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload.

Co-authored-by: Codex <noreply@openai.com>
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary.

Co-authored-by: Codex <noreply@openai.com>
Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs.

Co-authored-by: Codex <noreply@openai.com>
Rename the split Python SDK app-server integration files and helper module to concise group names.

Co-authored-by: Codex <noreply@openai.com>
Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness.

Co-authored-by: Codex <noreply@openai.com>
Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness.

Co-authored-by: Codex <noreply@openai.com>
Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history.

Co-authored-by: Codex <noreply@openai.com>
Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai changed the title [7/7] Add Python SDK app-server integration harness [7/8] Add Python SDK app-server integration harness May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant