[7/8] Add Python SDK app-server integration harness by aibrahim-oai · Pull Request #22014 · openai/codex

aibrahim-oai · 2026-05-10T10:33:24Z

Why

The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together.

This PR adds deterministic integration coverage that starts the pinned codex app-server process and mocks only the upstream Responses HTTP boundary.

What

Add AppServerHarness and MockResponsesServer helpers for isolated CODEX_HOME, mock-provider config, queued SSE responses, and captured /v1/responses requests.
Add shared helpers for SSE construction, stream assertions, approval-policy inspection, and image fixtures.
Split integration coverage into focused modules for run behavior, inputs, streaming, turn controls, approvals, and thread lifecycle.
Cover sync and async Thread.run, TurnHandle.stream, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping.
Replace public-wrapper tests that duplicated integration-test behavior with lower-level client tests only where direct client behavior is the thing under test.

Stack

[1/8] Pin Python SDK runtime dependency #21891 [1/8] Pin Python SDK runtime dependency
[2/8] Generate Python SDK types from pinned runtime #21893 [2/8] Generate Python SDK types from pinned runtime
[3/8] Run Python SDK tests in CI #21895 [3/8] Run Python SDK tests in CI
[4/8] Define Python SDK public API surface #21896 [4/8] Define Python SDK public API surface
[5/8] Rename Python SDK package to openai-codex #21905 [5/8] Rename Python SDK package to openai-codex
[6/8] Add high-level Python SDK approval mode #21910 [6/8] Add high-level Python SDK approval mode
This PR [7/8] Add Python SDK app-server integration harness
[8/8] Add Python SDK Ruff formatting #22021 [8/8] Add Python SDK Ruff formatting

Verification

Added pinned app-server integration tests under sdk/python/tests/test_app_server_*.py and test_real_app_server_integration.py.

Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful. Co-authored-by: Codex <noreply@openai.com>

Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists. Co-authored-by: Codex <noreply@openai.com>

Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing. Co-authored-by: Codex <noreply@openai.com>

Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals. Co-authored-by: Codex <noreply@openai.com>

Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path. Co-authored-by: Codex <noreply@openai.com>

Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server. Co-authored-by: Codex <noreply@openai.com>

Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers. Co-authored-by: Codex <noreply@openai.com>

Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload. Co-authored-by: Codex <noreply@openai.com>

Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary. Co-authored-by: Codex <noreply@openai.com>

Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs. Co-authored-by: Codex <noreply@openai.com>

Rename the split Python SDK app-server integration files and helper module to concise group names. Co-authored-by: Codex <noreply@openai.com>

Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness. Co-authored-by: Codex <noreply@openai.com>

Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness. Co-authored-by: Codex <noreply@openai.com>

Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history. Co-authored-by: Codex <noreply@openai.com>

Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary. Co-authored-by: Codex <noreply@openai.com>

aibrahim-oai and others added 14 commits May 10, 2026 13:37

Fix app-server integration expectations

feffa48

Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path. Co-authored-by: Codex <noreply@openai.com>

Add more SDK app-server integration coverage

ad23385

Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server. Co-authored-by: Codex <noreply@openai.com>

Fix new SDK integration assertions

4e9b978

Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers. Co-authored-by: Codex <noreply@openai.com>

Align multimodal integration assertion

daf4694

Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload. Co-authored-by: Codex <noreply@openai.com>

Split pinned app-server integration tests by behavior

57edbbf

Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary. Co-authored-by: Codex <noreply@openai.com>

Materialize fork lifecycle integration test

280d690

Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs. Co-authored-by: Codex <noreply@openai.com>

Shorten app-server integration test names

b9cd273

Rename the split Python SDK app-server integration files and helper module to concise group names. Co-authored-by: Codex <noreply@openai.com>

Cover SDK app-server integration gaps

d77f543

Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness. Co-authored-by: Codex <noreply@openai.com>

Tighten SDK integration assertions

c3e22fe

Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness. Co-authored-by: Codex <noreply@openai.com>

Drop unproven skill input integration case

f41f281

Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history. Co-authored-by: Codex <noreply@openai.com>

Assert loaded skill input injection

5c7b278

Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary. Co-authored-by: Codex <noreply@openai.com>

aibrahim-oai mentioned this pull request May 10, 2026

[8/8] Add Python SDK Ruff formatting #22021

Open

aibrahim-oai changed the title ~~[7/7] Add Python SDK app-server integration harness~~ [7/8] Add Python SDK app-server integration harness May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[7/8] Add Python SDK app-server integration harness#22014

[7/8] Add Python SDK app-server integration harness#22014
aibrahim-oai wants to merge 15 commits intocodex/python-sdk-approval-neverfrom
codex/python-sdk-mock-integration-tests

aibrahim-oai commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aibrahim-oai commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Stack

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aibrahim-oai commented May 10, 2026 •

edited

Loading