-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
Currently, Transformers.js uses a global env object for configuration. I propose moving to a session-based approach where configuration is passed directly to pipeline instances, while maintaining backward compatibility with the existing env object as defaults.
Current Behavior
import { env, pipeline } from "@huggingface/transformers";
env.remoteHost = "https://myhost.com";
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
// All pipelines now use myhost.comAll pipelines share the same global configuration through the env object.
Proposed Behavior
import { pipeline } from "@huggingface/transformers";
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment', {
remoteHost: "https://myhost.com",
});
// Only this pipeline uses myhost.comConfiguration is passed per-pipeline, allowing different instances to use different settings.
Motivation
-
Client-side context: The
envconcept feels server-oriented, where configuration is typically set once at startup. On the client, where you have long-running processes and lots of user interaction, a session-based approach makes more sense. -
Better encapsulation: Different pipeline instances might need different configurations (e.g., loading some models from a local cache, others from a CDN, others from a private host).
-
Industry standard: Other similar libraries follow this pattern:
- MediaPipe.js
- TensorFlow.js
- WebLLM
They all use session/instance-based configuration rather than global state.
-
Semantic clarity: An "environment" (
env) should represent static values based on the environment an app runs in (e.g.,IS_BROWSER_ENV,IS_FS_AVAILABLE). Most of the currentenvproperties are actually runtime configuration options, not environment detection.
Backward Compatibility
The existing env object would serve as the default configuration:
import { env, pipeline } from "@huggingface/transformers";
env.remoteHost = "https://myhost.com";
// Uses env.remoteHost as default
const pipe1 = await pipeline("text-generation", "gpt2");
// Overrides the default for this instance
const pipe2 = await pipeline("text-generation", "gpt2", {
remoteHost: "https://mysecondhost.com",
});This would maintain full backward compatibility while enabling the new pattern.
Implementation Considerations
I recognize this is a significant change. However, with v4 on the horizon, it might be a good opportunity to introduce this improvement. The migration path would be smooth since:
- Existing code continues to work unchanged
- Users can gradually adopt the new pattern
- Clear migration guide can be provided
Questions
- Would this fit into the v4 roadmap?
- Are there use cases where the global
envpattern is preferable? - Should we consider deprecation warnings for certain
envproperties in v4, with removal in v5?