Skip to content

Commit df55502

Browse files
feat: add conformance workflow and compatibility fixes (#387)
1 parent 632c7a3 commit df55502

15 files changed

Lines changed: 1335 additions & 2 deletions

CONTRIBUTING.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,14 @@ bash -n scripts/doctor.sh
5555
bash -n scripts/lint.sh
5656
```
5757

58+
External interoperability experiments stay outside the default regression baseline. When you need to reproduce current official-tool behavior, run:
59+
60+
```bash
61+
bash ./scripts/conformance.sh
62+
```
63+
64+
Treat that output as investigation input. Do not fold it into `doctor.sh` or the default CI quality gate unless the repository explicitly decides to promote a specific experiment into a maintained policy.
65+
5866
If you change extension methods, extension metadata, or Agent Card/OpenAPI contract surfaces, also run:
5967

6068
```bash

docs/conformance-triage.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# External Conformance Triage
2+
3+
This document records the first local `./scripts/conformance.sh mandatory` run against the official `a2aproject/a2a-tck` using the repository's dummy-backed SUT.
4+
5+
## Standards Used For Triage
6+
7+
- `a2a-sdk==0.3.25` as installed in this repository:
8+
- `AgentCard` uses `additionalInterfaces`, not `supportedInterfaces`.
9+
- JSON-RPC request models use `message/send`, `tasks/get`, `tasks/cancel`, and `agent/getAuthenticatedExtendedCard`.
10+
- The installed SDK does not expose a JSON-RPC `ListTasks` request model.
11+
- A2A v0.3.0 specification:
12+
- JSON-RPC methods use the `{category}/{action}` pattern such as `message/send` and `tasks/get`.
13+
- Transport declarations use `preferredTransport` plus `additionalInterfaces`.
14+
- The method mapping table lists `tasks/list` as gRPC/REST only.
15+
- Repository compatibility policy:
16+
- `A2A-Version` negotiation supports both `0.3` and `1.0`.
17+
- Payloads still follow the shipped `0.3` SDK baseline.
18+
- `1.0` compatibility is currently documented as partial rather than complete.
19+
20+
## Classification Labels
21+
22+
- `TCK issue`: the failing expectation conflicts with `a2a-sdk==0.3.25` and the v0.3.0 baseline used by this repository.
23+
- `TCK issue; also a repo v1.0 gap`: the exact failure is caused by a TCK mismatch, but the same area would still need extra work for stronger `1.0` compatibility.
24+
- `TCK issue / local experiment artifact`: the failure comes from an aggressive heuristic or from local dummy-run characteristics and should not be treated as a runtime protocol bug.
25+
26+
## Per-Test Triage
27+
28+
- `tests/mandatory/authentication/test_auth_compliance_v030.py::test_security_scheme_structure_compliance`: `TCK issue`. The TCK expects each `securitySchemes` entry to be wrapped as `{httpAuthSecurityScheme: {...}}`, but `a2a-sdk==0.3.25` exposes the flattened OpenAPI-shaped object with fields like `type`, `scheme`, `description`, and `bearerFormat`.
29+
- `tests/mandatory/authentication/test_auth_enforcement.py::test_authentication_scheme_consistency`: `TCK issue`. Same root cause as the previous test: the TCK validates a non-SDK wrapper shape instead of the installed SDK schema.
30+
- `tests/mandatory/jsonrpc/test_a2a_error_codes_enhanced.py::test_push_notification_not_supported_error_32003_enhanced`: `TCK issue`. The failure is a TCK helper bug: `transport_create_task_push_notification_config()` is called with the wrong positional signature before the runtime behavior is even exercised.
31+
- `tests/mandatory/jsonrpc/test_json_rpc_compliance.py::test_rejects_invalid_json_rpc_requests[invalid_request4--32602]`: `TCK issue`. The test sends JSON-RPC method `SendMessage`; under the v0.3.0 / SDK 0.3.25 baseline the correct method is `message/send`, so the runtime correctly returns `-32601` for an unknown method instead of `-32602`.
32+
- `tests/mandatory/jsonrpc/test_json_rpc_compliance.py::test_rejects_invalid_params`: `TCK issue`. Same method-name mismatch as above; with the correct `message/send` method the runtime returns `-32602` for invalid parameters.
33+
- `tests/mandatory/jsonrpc/test_protocol_violations.py::test_duplicate_request_ids`: `TCK issue`. The first request already fails because the TCK uses `SendMessage` instead of `message/send`, so the duplicate-ID assertion never reaches the actual duplicate-ID behavior.
34+
- `tests/mandatory/protocol/test_a2a_v030_new_methods.py::TestMethodMappingCompliance::test_core_method_mapping_compliance`: `TCK issue; also a repo v1.0 gap`. The JSON-RPC client uses PascalCase methods (`SendMessage`, `GetTask`, `CancelTask`) that do not match the v0.3.0 JSON-RPC mapping, but the repository also does not currently provide PascalCase aliases even when `A2A-Version: 1.0` is negotiated.
35+
- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_valid_text`: `TCK issue; also a repo v1.0 gap`. The failing request uses `SendMessage` over JSON-RPC; the repository correctly supports `message/send` for the current SDK baseline, but not the PascalCase alias.
36+
- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_invalid_params`: `TCK issue; also a repo v1.0 gap`. Direct cause is the same PascalCase JSON-RPC method mismatch.
37+
- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_continue_task`: `TCK issue; also a repo v1.0 gap`. Direct cause is again `SendMessage` instead of `message/send`.
38+
- `tests/mandatory/protocol/test_state_transitions.py::test_task_history_length`: `TCK issue; also a repo v1.0 gap`. Task creation fails only because the TCK uses `SendMessage` on JSON-RPC.
39+
- `tests/mandatory/protocol/test_tasks_cancel_method.py::test_tasks_cancel_valid`: `TCK issue; also a repo v1.0 gap`. The fixture cannot create a task because the TCK uses `SendMessage`; the runtime's `tasks/cancel` behavior is not the direct failing cause in this run.
40+
- `tests/mandatory/protocol/test_tasks_cancel_method.py::test_tasks_cancel_nonexistent`: `TCK issue; also a repo v1.0 gap`. The TCK calls JSON-RPC `CancelTask`; under the v0.3.0 baseline the method is `tasks/cancel`. With the correct method, the runtime returns `Task not found` / `-32001`.
41+
- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_valid`: `TCK issue; also a repo v1.0 gap`. The task-creation fixture fails first because the TCK uses `SendMessage`.
42+
- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_with_history_length`: `TCK issue; also a repo v1.0 gap`. Same fixture failure via `SendMessage`.
43+
- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_nonexistent`: `TCK issue; also a repo v1.0 gap`. The TCK calls JSON-RPC `GetTask`; under the v0.3.0 baseline the method is `tasks/get`. With the correct method, the runtime returns `Task not found` / `-32001`.
44+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_all_tasks`: `TCK issue; also a repo v1.0 gap`. The test suite uses JSON-RPC `ListTasks`, which is outside the `a2a-sdk==0.3.25` JSON-RPC surface and outside the v0.3.0 JSON-RPC mapping.
45+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_empty_when_none_exist`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
46+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_validates_required_fields`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
47+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_sorted_by_timestamp_descending`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
48+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_context_id`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
49+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_status`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
50+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_last_updated_after`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
51+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_combined_filters`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
52+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_default_page_size`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
53+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_custom_page_size`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
54+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_page_token_navigation`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
55+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_last_page_detection`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
56+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_total_size_accuracy`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
57+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_zero`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
58+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_custom`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
59+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_exceeds_actual`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
60+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestArtifactInclusion::test_artifacts_excluded_by_default`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
61+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestArtifactInclusion::test_artifacts_included_when_requested`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
62+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_page_token_error`: `TCK issue; also a repo v1.0 gap`. The assertion expects JSON-RPC param validation on `ListTasks`, but the direct failure is still that `ListTasks` is not a supported JSON-RPC method in the current SDK baseline.
63+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_status_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
64+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_negative_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
65+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_zero_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
66+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_out_of_range_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
67+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_default_page_size_is_50`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
68+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_negative_history_length_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
69+
- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_timestamp_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch.
70+
- `tests/mandatory/security/test_agent_card_security.py::test_public_agent_card_access_control`: `TCK issue`. The TCK requires `supportedInterfaces`, but `a2a-sdk==0.3.25` and the v0.3.0 specification use `additionalInterfaces`.
71+
- `tests/mandatory/security/test_agent_card_security.py::test_sensitive_information_protection`: `TCK issue / local experiment artifact`. The failure is driven by heuristic keyword scanning (`token`, `private`, `127.0.0.1`, non-standard port) against a local dummy-backed run. That is not a reliable indicator of protocol non-compliance.
72+
- `tests/mandatory/security/test_agent_card_security.py::test_security_scheme_consistency`: `TCK issue`. Same schema mismatch as the earlier authentication tests: the TCK expects wrapped security scheme objects instead of the installed SDK shape.
73+
- `tests/mandatory/transport/test_multi_transport_equivalence.py::test_message_sending_equivalence`: `TCK issue; also a repo v1.0 gap`. The transport client uses JSON-RPC `SendMessage`; under the v0.3.0 baseline the method is `message/send`, but stronger `1.0` compatibility would still require additional alias handling.
74+
- `tests/mandatory/transport/test_multi_transport_equivalence.py::test_concurrent_operation_equivalence`: `TCK issue; also a repo v1.0 gap`. Same direct cause as the previous test: the JSON-RPC client sends `SendMessage`.
75+
76+
## Adjacent Repository Gaps Found During Triage
77+
78+
These did not directly cause the exact failed node IDs above, but they are real repository-side gaps revealed during follow-up probes:
79+
80+
- `A2A-Version: 1.0` still returns `-32601` for JSON-RPC `SendMessage` and `GetExtendedAgentCard`. That means current `1.0` support is still limited to negotiation and error-shaping rather than full method-surface compatibility.
81+
- `GET /v1/tasks` currently returns `500 NotImplementedError` in a local probe, even though the route exists and repository docs describe the SDK-owned REST surface as including task listing. That behavior should be treated as a repository issue independent from the TCK's incorrect JSON-RPC `ListTasks` expectation.
82+
83+
## Summary
84+
85+
For the exact 47 failed/error cases in the first mandatory run:
86+
87+
- No failure is a clean `a2a-sdk==0.3.25` / v0.3.0 conformance bug in the current runtime.
88+
- Most failures come from TCK method/schema assumptions that do not match the shipped SDK baseline.
89+
- Several failures also highlight future repository work if stronger `1.0` compatibility becomes a goal.

docs/conformance.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# External Conformance Experiments
2+
3+
This repository keeps internal regression and external interoperability experiments separate on purpose.
4+
5+
## Scope
6+
7+
- `./scripts/doctor.sh` remains the primary internal regression entrypoint.
8+
- `./scripts/conformance.sh` is a local/manual experiment entrypoint for official external tooling.
9+
- External conformance output should be treated as investigation input, not as an automatic merge gate.
10+
11+
## Current Experiment Shape
12+
13+
The default `./scripts/conformance.sh` workflow does the following:
14+
15+
1. Sync the repository environment unless explicitly skipped.
16+
2. Cache or refresh the official `a2aproject/a2a-tck` checkout.
17+
3. Start a local dummy-backed `opencode-a2a` runtime unless `CONFORMANCE_SUT_URL` points to an existing SUT.
18+
4. Run the requested TCK category, defaulting to `mandatory`.
19+
5. Preserve raw logs and machine-readable reports under `run/conformance/<timestamp>/`.
20+
21+
The default local SUT uses the repository test double `DummyChatOpencodeUpstreamClient`. That keeps the experiment reproducible without requiring a live OpenCode upstream.
22+
23+
## Usage
24+
25+
Run the default mandatory experiment:
26+
27+
```bash
28+
bash ./scripts/conformance.sh
29+
```
30+
31+
Run a different TCK category:
32+
33+
```bash
34+
bash ./scripts/conformance.sh capabilities
35+
```
36+
37+
Target an already running runtime instead of the local dummy-backed SUT:
38+
39+
```bash
40+
CONFORMANCE_SUT_URL=http://127.0.0.1:8000 \
41+
A2A_AUTH_TYPE=bearer \
42+
A2A_AUTH_TOKEN=dev-token \
43+
bash ./scripts/conformance.sh mandatory
44+
```
45+
46+
## Artifacts
47+
48+
Each run keeps the following artifacts in the selected output directory:
49+
50+
- `agent-card.json`: fetched public Agent Card
51+
- `health.json`: fetched authenticated health payload when the local SUT is used
52+
- `tck.log`: raw TCK console output
53+
- `pytest-report.json`: pytest-json-report output emitted by the TCK runner
54+
- `failed-tests.json`: compact list of failed/error node IDs for triage
55+
- `metadata.json`: experiment metadata including local repo commit and cached TCK commit
56+
57+
## Interpretation Guidance
58+
59+
When a TCK run fails, inspect the raw report before changing the runtime:
60+
61+
- Some failures may point to real runtime gaps.
62+
- Some failures may come from TCK assumptions that do not match `a2a-sdk==0.3.25`.
63+
- Some failures may come from A2A v0.3 versus v1.0 naming or schema drift.
64+
65+
The experiment is useful only if those categories stay separate during triage.
66+
67+
The current first-pass triage is recorded in [`./conformance-triage.md`](./conformance-triage.md).

scripts/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Executable scripts live in this directory. This file is the entry index for the
1212
## Other Scripts
1313

1414
- [`doctor.sh`](./doctor.sh): primary local development regression entrypoint (uv sync + lint + tests + coverage)
15+
- [`conformance.sh`](./conformance.sh): local/manual external A2A conformance experiment entrypoint; caches the official TCK, can launch a dummy-backed local SUT, and preserves raw artifacts under `run/conformance/`
1516
- [`dependency_health.sh`](./dependency_health.sh): development dependency review entrypoint (`sync`/`pip check` + outdated + dev audit), while blocking CI/publish audits focus on runtime dependencies
1617
- [`check_coverage.py`](./check_coverage.py): enforces the overall coverage floor and per-file minimums for critical modules
1718
- [`lint.sh`](./lint.sh): lint helper
@@ -20,3 +21,4 @@ Executable scripts live in this directory. This file is the entry index for the
2021
## Notes
2122

2223
- `doctor.sh` and `dependency_health.sh` intentionally remain separate entrypoints and share common prerequisites through [`health_common.sh`](./health_common.sh).
24+
- External conformance experiments remain intentionally separate from the default regression path. See [`../docs/conformance.md`](../docs/conformance.md).

0 commit comments

Comments
 (0)