Parametric LLM model tests by datvo06 · Pull Request #617 · BasisResearch/effectful

datvo06 · 2026-03-22T19:37:27Z

closes #589

… env var Replace provider-specific environment variable checks (OPENAI_API_KEY, ANTHROPIC_API_KEY) and skip markers (requires_openai, requires_anthropic) with a single EFFECTFUL_LLM_MODEL environment variable that controls which model is used for all LLM integration tests. - Remove hardcoded model names from all test files in favor of LLM_MODEL read from EFFECTFUL_LLM_MODEL env var - Replace requires_openai/requires_anthropic markers with requires_llm - Remove model parametrization that was cross-provider; tests now use whichever model the env var specifies - Use litellm.supports_vision() to conditionally skip vision tests - Remove default model from LiteLLMProvider (make model required) - Update CI workflow to pass EFFECTFUL_LLM_MODEL as a matrix parameter, making it easy to add parallel CI stages for different providers - Rename/remove fixture files to match updated test names Closes #589

…ename test - Move LLM_MODEL and requires_llm definitions to tests/conftest.py; all four LLM test files now import from there - Fix import ordering in test_handlers_llm_encoding.py (stdlib before local) - Rename test_agent_tool_names_are_openai_compatible_integration to test_agent_tool_names_are_valid_integration since it's no longer OpenAI-specific

- Restore LiteLLMProvider default model via env var fallback so existing callers (template.py, test_handlers_llm_template.py, notebook) are not broken: model=os.environ.get("EFFECTFUL_LLM_MODEL", "gpt-4o") - Move LLM_MODEL/requires_llm from conftest.py to tests/_llm_helpers.py since conftest.py should not be imported directly - Fix import placement in test_handlers_llm_provider.py (was between module-level constants instead of grouped with imports) - Remove redundant @requires_llm on vision tests where the skipif condition already covers the not-LLM_MODEL case

The env var belongs in test infrastructure, not the library API. LiteLLMProvider should have a clean, explicit default.

Remove provider-specific environment variable checks and skip markers from LLM tests in favor of a single EFFECTFUL_LLM_MODEL env var. - Add LLM_MODEL and requires_llm to tests/conftest.py; LLM_MODEL defaults to "gpt-4o-mini" and is overridable via EFFECTFUL_LLM_MODEL - Live tests (tool calling, encoding, agent tool names) use LLM_MODEL and are gated by requires_llm which checks for any provider API key - Integration tests use make_provider() which returns a live LiteLLMProvider when API keys are available, else falls back to ReplayLiteLLMProvider for offline replay from fixtures - Replay-only tests (simple_prompt_multiple_models, cross_endpoint, caching) keep hardcoded model names and always run since they never call the API - Update CI workflow to pass EFFECTFUL_LLM_MODEL as a matrix parameter for easy parallel stages with different providers Closes #589

…asisresearch/effectful into dn-fully-parametric-model-test

datvo06 · 2026-03-23T13:24:37Z

Seems like this refractoring uncovered that we never tested encoding integration for real after putting some fixtures there. Tuple encoding is failing again. Will try to fix.

Three fixes in encoding.py: 1. TupleEncodable.encode() now returns a TupleItems model instance (not a raw tuple), and deserialize() returns the model directly. This fixes pydantic validation in litellm integration tests for NamedTuple and fixed-tuple types. 2. Add _TupleSafeJsonSchema that overrides pydantic's tuple_schema() to produce object schemas (item_0, item_1 properties) instead of prefixItems arrays. Applied via _BoxEncoding.model_json_schema() so dataclasses containing tuple fields produce OpenAI-compatible schemas. 3. SequenceEncodable.encode() returns a list (not tuple) to preserve encode idempotency — nested_type on a list dispatches to the sequence encoder, avoiding a mismatch with TupleEncodable. Also adds test_handlers_llm_encoding.py back to CI workflow.

Revert encoding.py to master and remove encoding tests from CI workflow. Tuple schema fixes will be in a dedicated PR.

datvo06 added 12 commits March 22, 2026 14:50

Revert LiteLLMProvider default to plain literal "gpt-4o"

1b2ac39

The env var belongs in test infrastructure, not the library API. LiteLLMProvider should have a clean, explicit default.

Minor fix

74015d9

minor fixes

c6f4f9c

minor

ead9363

Merge branch 'dn-fully-parametric-model-test' of https://github.com/b…

162db66

…asisresearch/effectful into dn-fully-parametric-model-test

Minor fix

815f9fb

Minor fix

79424c7

Merge branch 'dn-fully-parametric-model-test' of https://github.com/b…

4f7bb3f

…asisresearch/effectful into dn-fully-parametric-model-test

datvo06 marked this pull request as draft March 23, 2026 15:14

datvo06 added 2 commits March 23, 2026 11:17

Clean up

b4e3086

Split out tuple encoding fixes to separate branch

7aad873

Revert encoding.py to master and remove encoding tests from CI workflow. Tuple schema fixes will be in a dedicated PR.

datvo06 force-pushed the dn-fully-parametric-model-test branch from be46f6a to 7aad873 Compare March 23, 2026 17:22

datvo06 marked this pull request as ready for review March 23, 2026 17:24

datvo06 requested a review from eb8680 March 23, 2026 17:24

datvo06 mentioned this pull request Mar 23, 2026

Fix tuple schema compatibility with OpenAI structured output API #619

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parametric LLM model tests#617

Parametric LLM model tests#617
datvo06 wants to merge 15 commits intomasterfrom
dn-fully-parametric-model-test

datvo06 commented Mar 22, 2026

Uh oh!

datvo06 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

datvo06 commented Mar 22, 2026

Uh oh!

datvo06 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant