Skip to content

feat: make transformers and the vLLM client optional dependencies (#31)#70

Draft
hallerite wants to merge 4 commits into
mainfrom
optional-transformers
Draft

feat: make transformers and the vLLM client optional dependencies (#31)#70
hallerite wants to merge 4 commits into
mainfrom
optional-transformers

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 27, 2026

Closes #31.

Why

transformers is a heavy dependency, and downstreams that keep training deps lightweight (e.g. TorchTitan/TorchTune, which load tokenizers via tokenizers) shouldn't have to pull it in just to use a renderer. This makes the heavy/engine-specific pieces opt-in, so the base install is lightweight and text-only renderers work with a bring-your-own tokenizer.

What changed

Tokenizer + Processor protocols (renderers/base.py) — structural types replacing transformers.PreTrainedTokenizer in every renderer's tokenizer/processor annotations. The module-level from transformers... import PreTrainedTokenizer is gone from all 13 renderer modules, so import renderers.<model> no longer drags in transformers.

transformers + fastokens[transformers] extra. Needed only by the convenience helpers (load_tokenizer, create_renderer*), the offset-attribution fallback in attribute_text_segments, and the VLM renderers (image processors). _require_transformers() raises a clear pip install 'renderers[transformers]' error on those lazy paths when it's missing.

renderers.client[vllm] extra. The vLLM /inference/v1/generate client is the only thing needing openai + httpx; it's no longer imported by renderers/__init__ (so import renderers stays free of HTTP/engine deps). OverlongPromptError is now imported from renderers.client (no top-level re-export).

Result — base pip install renderers core deps are just: numpy, tiktoken, jinja2, openai-harmony, prime-pydantic-config. Heavy bits are renderers[transformers] and renderers[vllm] (composable).

Caveats (documented in the README)

A bring-your-own tokenizer must satisfy the Tokenizer protocol (encode/decode/convert_tokens_to_ids/apply_chat_template + name_or_path/unk_token_id/eos_token_id), and per-token training attribution additionally needs tokenizer(..., return_offsets_mapping=True) — without it, attribution falls back to a vanilla HF tokenizer (the extra).

Tests

  • New tests/test_no_transformers.py: subprocess-blocks transformers/fastokens/openai/httpx, then asserts import renderers + a text renderer's render/parse work, that no blocked module leaks into sys.modules, and that load_tokenizer errors with the install hint.
  • Full suite green; ruff + ty clean.

🤖 Generated with Claude Code


Note

Medium Risk
Breaking for consumers that imported OverlongPromptError from renderers or assumed transformers/openai on base install; behavior is documented and guarded with new boundary tests.

Overview
This PR makes the base renderers install lightweight by moving heavy deps behind optional extras and letting text renderers run with a bring-your-own tokenizer.

Packaging: transformers and fastokens are no longer core dependencies; they install via renderers[transformers] (used by load_tokenizer / create_renderer*, offset attribution fallback, and VLMs). openai and httpx move to renderers[vllm] for renderers.client. Dev deps mirror both extras so CI still exercises those paths.

Typing / imports: New Tokenizer, ChatTemplateTokenizer, and Processor protocols in renderers/base.py replace PreTrainedTokenizer on all renderer modules, so importing renderers no longer pulls in transformers. VLMs load AutoProcessor through _require_transformers(), which raises a clear install hint if the extra is missing.

Public API: OverlongPromptError is dropped from renderers top-level exports; use from renderers.client import OverlongPromptError. Tokenizer, ChatTemplateTokenizer, and Processor are exported from the package root.

Tests / docs: tests/test_no_transformers.py subprocess-blocks optional deps and checks text render/parse, no leaked imports, and load_tokenizer errors. README documents extras, protocol expectations, and attribution caveats.

Reviewed by Cursor Bugbot for commit 5062105. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Make transformers and vLLM client optional dependencies

  • Moves transformers, fastokens, openai, and httpx out of base dependencies into optional extras: renderers[transformers] and renderers[vllm].
  • Introduces Tokenizer, ChatTemplateTokenizer, and Processor protocols in renderers/base.py so renderer classes no longer import from transformers at module scope.
  • Adds _require_transformers() helper that raises a clear ImportError pointing to the renderers[transformers] extra when transformers is not installed, replacing generic import failures.
  • OverlongPromptError is no longer exported from the top-level renderers package.
  • Adds tests/test_no_transformers.py to verify text renderers work with a bring-your-own tokenizer and that missing transformers produces the correct error.

Macroscope summarized 5062105. (Automatic summaries will resume when PR exits draft mode or review begins).

hallerite and others added 2 commits May 27, 2026 17:09
`transformers` (+ `fastokens`) and the `openai`/`httpx`-based vLLM generate
client are no longer core dependencies. Text-only renderers now work with a
bring-your-own tokenizer and none of the heavy deps installed.

- Add `Tokenizer` + `Processor` structural protocols in `base.py`; type the
  renderer `tokenizer`/`processor` params against them instead of
  `transformers.PreTrainedTokenizer`, so importing a renderer no longer drags
  in `transformers`.
- Move `transformers` + `fastokens` to the `[transformers]` extra and
  `openai` + `httpx` to the `[vllm]` extra. `_require_transformers()` raises a
  clear "install renderers[transformers]" error on the lazy paths
  (`load_tokenizer`, offset attribution, VLM processors).
- `renderers.client` is opt-in: no longer imported by `renderers/__init__`,
  and `OverlongPromptError` moves with it (importable from `renderers.client`).
- Add `tests/test_no_transformers.py` proving text render/parse and
  `import renderers` work with `transformers`/`fastokens`/`openai`/`httpx`
  import-blocked, and that `load_tokenizer` errors clearly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes May 27, 2026
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Approved

This PR restructures transformers and vLLM client dependencies as optional extras through mechanical changes: replacing concrete type imports with Protocol-based structural types and adding lazy import wrappers with clear error messages. Runtime behavior is unchanged for users with full dependencies installed, and new tests validate the optional dependency boundary.

You can customize Macroscope's approvability policy. Learn more.

…e from Tokenizer

Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded
enable_thinking). The only qwen35.py conflict is resolved by keeping #71's
hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template`
probe) on top of #31's `Tokenizer`/`Processor` type hints.

Now that #71 removed the last hand-coded-renderer call to
`apply_chat_template`, drop it from the `Tokenizer` protocol so a plain
`tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to
a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by
`DefaultRenderer` (the generic chat-template fallback).
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes May 27, 2026
@hallerite hallerite marked this pull request as draft May 27, 2026 21:42
@hallerite
Copy link
Copy Markdown
Member Author

not really satisfied with the way attribute_text_segments for character-offset attribution is handled, so further iterating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is transformers necessary or tokenizers is enough?

1 participant