Skip to content

Parametric LLM model tests#617

Open
datvo06 wants to merge 15 commits intomasterfrom
dn-fully-parametric-model-test
Open

Parametric LLM model tests#617
datvo06 wants to merge 15 commits intomasterfrom
dn-fully-parametric-model-test

Conversation

@datvo06
Copy link
Contributor

@datvo06 datvo06 commented Mar 22, 2026

closes #589

datvo06 added 12 commits March 22, 2026 14:50
… env var

Replace provider-specific environment variable checks (OPENAI_API_KEY,
ANTHROPIC_API_KEY) and skip markers (requires_openai, requires_anthropic)
with a single EFFECTFUL_LLM_MODEL environment variable that controls
which model is used for all LLM integration tests.

- Remove hardcoded model names from all test files in favor of LLM_MODEL
  read from EFFECTFUL_LLM_MODEL env var
- Replace requires_openai/requires_anthropic markers with requires_llm
- Remove model parametrization that was cross-provider; tests now use
  whichever model the env var specifies
- Use litellm.supports_vision() to conditionally skip vision tests
- Remove default model from LiteLLMProvider (make model required)
- Update CI workflow to pass EFFECTFUL_LLM_MODEL as a matrix parameter,
  making it easy to add parallel CI stages for different providers
- Rename/remove fixture files to match updated test names

Closes #589
…ename test

- Move LLM_MODEL and requires_llm definitions to tests/conftest.py;
  all four LLM test files now import from there
- Fix import ordering in test_handlers_llm_encoding.py (stdlib before local)
- Rename test_agent_tool_names_are_openai_compatible_integration to
  test_agent_tool_names_are_valid_integration since it's no longer
  OpenAI-specific
- Restore LiteLLMProvider default model via env var fallback so existing
  callers (template.py, test_handlers_llm_template.py, notebook) are not
  broken: model=os.environ.get("EFFECTFUL_LLM_MODEL", "gpt-4o")
- Move LLM_MODEL/requires_llm from conftest.py to tests/_llm_helpers.py
  since conftest.py should not be imported directly
- Fix import placement in test_handlers_llm_provider.py (was between
  module-level constants instead of grouped with imports)
- Remove redundant @requires_llm on vision tests where the skipif
  condition already covers the not-LLM_MODEL case
The env var belongs in test infrastructure, not the library API.
LiteLLMProvider should have a clean, explicit default.
Remove provider-specific environment variable checks and skip markers
from LLM tests in favor of a single EFFECTFUL_LLM_MODEL env var.

- Add LLM_MODEL and requires_llm to tests/conftest.py; LLM_MODEL
  defaults to "gpt-4o-mini" and is overridable via EFFECTFUL_LLM_MODEL
- Live tests (tool calling, encoding, agent tool names) use LLM_MODEL
  and are gated by requires_llm which checks for any provider API key
- Integration tests use make_provider() which returns a live
  LiteLLMProvider when API keys are available, else falls back to
  ReplayLiteLLMProvider for offline replay from fixtures
- Replay-only tests (simple_prompt_multiple_models, cross_endpoint,
  caching) keep hardcoded model names and always run since they never
  call the API
- Update CI workflow to pass EFFECTFUL_LLM_MODEL as a matrix parameter
  for easy parallel stages with different providers

Closes #589
@datvo06
Copy link
Contributor Author

datvo06 commented Mar 23, 2026

Seems like this refractoring uncovered that we never tested encoding integration for real after putting some fixtures there. Tuple encoding is failing again. Will try to fix.

Three fixes in encoding.py:

1. TupleEncodable.encode() now returns a TupleItems model instance
   (not a raw tuple), and deserialize() returns the model directly.
   This fixes pydantic validation in litellm integration tests for
   NamedTuple and fixed-tuple types.

2. Add _TupleSafeJsonSchema that overrides pydantic's tuple_schema()
   to produce object schemas (item_0, item_1 properties) instead of
   prefixItems arrays. Applied via _BoxEncoding.model_json_schema()
   so dataclasses containing tuple fields produce OpenAI-compatible
   schemas.

3. SequenceEncodable.encode() returns a list (not tuple) to preserve
   encode idempotency — nested_type on a list dispatches to the
   sequence encoder, avoiding a mismatch with TupleEncodable.

Also adds test_handlers_llm_encoding.py back to CI workflow.
@datvo06 datvo06 marked this pull request as draft March 23, 2026 15:14
datvo06 added 2 commits March 23, 2026 11:17
Revert encoding.py to master and remove encoding tests from CI
workflow. Tuple schema fixes will be in a dedicated PR.
@datvo06 datvo06 force-pushed the dn-fully-parametric-model-test branch from be46f6a to 7aad873 Compare March 23, 2026 17:22
@datvo06 datvo06 marked this pull request as ready for review March 23, 2026 17:24
@datvo06 datvo06 requested a review from eb8680 March 23, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tests and examples for handlers.llm should be fully parametric in model name

1 participant