-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug: LocalPipelineSession Mutates sagemaker_client by Injecting _pipelines Attribute
PySDK Version
- PySDK V3 (3.x)
- PySDK V2 (2.x)
Describe the bug
LocalPipelineSession dynamically adds a _pipelines attribute to the sagemaker_client object:
if not hasattr(self.sagemaker_client, "_pipelines"):
self.sagemaker_client._pipelines = {}This modifies the client instance by attaching internal state that is unrelated to the client’s intended responsibility.
Because sagemaker_client is typically expected to behave like a boto-style service client, mutating it in this way introduces hidden state and side effects.
Why this is problematic
This approach can cause several issues:
-
Breaks encapsulation
Pipeline state is stored on the service client rather than on the session or pipeline objects that own the state. -
Unexpected side effects
If multipleLocalPipelineSessioninstances share the same client object, they will also share the same_pipelinesregistry. -
Potential attribute collision
_pipelinescould conflict with future attributes added to the client implementation. -
Harder debugging and maintenance
Attaching internal state to an external object makes the code harder to reason about and maintain.
In general, AWS SDK components avoid mutating externally provided clients and instead maintain internal state within the session or service wrapper.
To reproduce
Example demonstrating shared pipeline state when using the same client object:
from sagemaker.workflow.pipeline_context import LocalPipelineSession
session1 = LocalPipelineSession()
session2 = LocalPipelineSession()
client = session1.sagemaker_client
# session2 reuses the same client
session2.sagemaker_client = client
session1.sagemaker_client._pipelines["pipelineA"] = "A"
print(session2.sagemaker_client._pipelines)Output:
{'pipelineA': 'A'}
Both sessions share the same _pipelines state because it was attached to the client object.
Expected behavior
Pipeline state should be owned by the session or pipeline classes, not by the sagemaker_client.
A safer design would store pipelines inside the session instance:
class LocalPipelineSession(LocalSession):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._local_pipelines = {}This ensures:
- Pipeline state is isolated per session
- No mutation of external client objects
- Clear ownership of internal state
System information
- SageMaker Python SDK version: main branch
- Python version: 3.9+
- CPU or GPU: CPU
- Custom Docker image (Y/N): N