Skip to content

[RFC] Move from global env to session-based configuration #1479

@nico-martin

Description

@nico-martin

Summary

Currently, Transformers.js uses a global env object for configuration. I propose moving to a session-based approach where configuration is passed directly to pipeline instances, while maintaining backward compatibility with the existing env object as defaults.

Current Behavior

import { env, pipeline } from "@huggingface/transformers";

env.remoteHost = "https://myhost.com";

const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
// All pipelines now use myhost.com

All pipelines share the same global configuration through the env object.

Proposed Behavior

import { pipeline } from "@huggingface/transformers";

const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment', {
  remoteHost: "https://myhost.com",
});
// Only this pipeline uses myhost.com

Configuration is passed per-pipeline, allowing different instances to use different settings.

Motivation

  1. Client-side context: The env concept feels server-oriented, where configuration is typically set once at startup. On the client, where you have long-running processes and lots of user interaction, a session-based approach makes more sense.

  2. Better encapsulation: Different pipeline instances might need different configurations (e.g., loading some models from a local cache, others from a CDN, others from a private host).

  3. Industry standard: Other similar libraries follow this pattern:

    • MediaPipe.js
    • TensorFlow.js
    • WebLLM

    They all use session/instance-based configuration rather than global state.

  4. Semantic clarity: An "environment" (env) should represent static values based on the environment an app runs in (e.g., IS_BROWSER_ENV, IS_FS_AVAILABLE). Most of the current env properties are actually runtime configuration options, not environment detection.

Backward Compatibility

The existing env object would serve as the default configuration:

import { env, pipeline } from "@huggingface/transformers";

env.remoteHost = "https://myhost.com";

// Uses env.remoteHost as default
const pipe1 = await pipeline("text-generation", "gpt2");

// Overrides the default for this instance
const pipe2 = await pipeline("text-generation", "gpt2", {
  remoteHost: "https://mysecondhost.com",
});

This would maintain full backward compatibility while enabling the new pattern.

Implementation Considerations

I recognize this is a significant change. However, with v4 on the horizon, it might be a good opportunity to introduce this improvement. The migration path would be smooth since:

  1. Existing code continues to work unchanged
  2. Users can gradually adopt the new pattern
  3. Clear migration guide can be provided

Questions

  • Would this fit into the v4 roadmap?
  • Are there use cases where the global env pattern is preferable?
  • Should we consider deprecation warnings for certain env properties in v4, with removal in v5?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions