Skip to content

LocalPipelineSession Mutates sagemaker_client by Injecting _pipelines Attribute #5604

@Mattral

Description

@Mattral

Bug: LocalPipelineSession Mutates sagemaker_client by Injecting _pipelines Attribute

PySDK Version

  • PySDK V3 (3.x)
  • PySDK V2 (2.x)

Describe the bug

LocalPipelineSession dynamically adds a _pipelines attribute to the sagemaker_client object:

if not hasattr(self.sagemaker_client, "_pipelines"):
    self.sagemaker_client._pipelines = {}

This modifies the client instance by attaching internal state that is unrelated to the client’s intended responsibility.

Because sagemaker_client is typically expected to behave like a boto-style service client, mutating it in this way introduces hidden state and side effects.


Why this is problematic

This approach can cause several issues:

  • Breaks encapsulation
    Pipeline state is stored on the service client rather than on the session or pipeline objects that own the state.

  • Unexpected side effects
    If multiple LocalPipelineSession instances share the same client object, they will also share the same _pipelines registry.

  • Potential attribute collision
    _pipelines could conflict with future attributes added to the client implementation.

  • Harder debugging and maintenance
    Attaching internal state to an external object makes the code harder to reason about and maintain.

In general, AWS SDK components avoid mutating externally provided clients and instead maintain internal state within the session or service wrapper.


To reproduce

Example demonstrating shared pipeline state when using the same client object:

from sagemaker.workflow.pipeline_context import LocalPipelineSession

session1 = LocalPipelineSession()
session2 = LocalPipelineSession()

client = session1.sagemaker_client

# session2 reuses the same client
session2.sagemaker_client = client

session1.sagemaker_client._pipelines["pipelineA"] = "A"

print(session2.sagemaker_client._pipelines)

Output:

{'pipelineA': 'A'}

Both sessions share the same _pipelines state because it was attached to the client object.


Expected behavior

Pipeline state should be owned by the session or pipeline classes, not by the sagemaker_client.

A safer design would store pipelines inside the session instance:

class LocalPipelineSession(LocalSession):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._local_pipelines = {}

This ensures:

  • Pipeline state is isolated per session
  • No mutation of external client objects
  • Clear ownership of internal state

System information

  • SageMaker Python SDK version: main branch
  • Python version: 3.9+
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions