fix: validate region parameter before URL interpolation to prevent SSRF#5819
fix: validate region parameter before URL interpolation to prevent SSRF#5819lucasjia-aws wants to merge 3 commits intoaws:masterfrom
Conversation
2544331 to
524545d
Compare
- Replace invalid test region "testregion" with valid AWS region "us-west-2" in profiler_app tests and tensorboard tests - Add comprehensive region_validation test suite covering all known AWS regions - Add tests for SSRF payload rejection and malformed region string handling - Add tests for endpoint URL validation against AWS domains - Ensure region validation regex accepts all legitimate AWS regions and rejects malicious inputs
| ) | ||
| version_config = version_config.get(py_version) or version_config | ||
| registry = _registry_from_region(region, version_config["registries"]) | ||
| validate_region(region) |
There was a problem hiding this comment.
Why is validate region scattered all over the place? Ideally it should only be checked before a request is sent ? Somewhere through the sagemaker client or in sagemaker core?
There was a problem hiding this comment.
There's no single chokepoint where all region-to-URL paths converge. Many of these URLs never go through a SageMaker client at all — telemetry uses raw requests.get(), Studio/Console URLs are returned to users as browser links, ECR image URIs are just strings passed to Docker/SageMaker APIs, and STS endpoints are passed as endpoint_url to boto3 (which doesn't validate it points to AWS). Validating once in Session.init() would only cover paths that obtain region from the session, but not paths where region is extracted from untrusted ARN strings (e.g., _parse_job_arn()) or passed directly as a function parameter (e.g., image_uris.retrieve(region=...)). The validation is placed at URL construction sites because that's where the region value actually becomes dangerous, and it's the only approach that covers all paths. This is consistent with how CVE-2026-22611 was fixed in other AWS SDKs — validate at endpoint construction, not at a single entry point.
There was a problem hiding this comment.
Can we add region validation to the Pydantic middleware or create a new middleware for sanitizing user inputs.
A middleware could ensure the region validation happens every time without us having to add it explicitly for every call?
Or a new datatype AwsRegion that can be instantiated like a string but throws an exception if the regex fails? So all uses of region: Optional[str] would be replaced with region: Optional[AwsRegion]
Devs might skip/forget adding region check in the future as this is an ever changing codebase.
| ) | ||
| version_config = version_config.get(py_version) or version_config | ||
| registry = _registry_from_region(region, version_config["registries"]) | ||
| validate_region(region) |
There was a problem hiding this comment.
Can we add region validation to the Pydantic middleware or create a new middleware for sanitizing user inputs.
A middleware could ensure the region validation happens every time without us having to add it explicitly for every call?
Or a new datatype AwsRegion that can be instantiated like a string but throws an exception if the regex fails? So all uses of region: Optional[str] would be replaced with region: Optional[AwsRegion]
Devs might skip/forget adding region check in the future as this is an ever changing codebase.
|
|
||
| from sagemaker.core.region_validation import validate_region | ||
|
|
||
| validate_region(region) |
There was a problem hiding this comment.
Can we add validate region to _botocore_resolver() instead?
Move validate_region() into Session._initialize(), BaseInteractiveApp.__init__(), and DetailProfilerApp.__init__() so that region is validated automatically at object creation time, reducing the chance of future developers forgetting to add per-site checks. Remove 7 redundant per-site validate_region() calls where region already comes from a validated Session: - session_helper.py sts_regional_endpoint() - spark/processing.py - jumpstart/utils.py - telemetry_logging.py - serve/telemetry_logger.py - tensorboard.py - detail_profiler_app.py method-level call Retain 8 per-site calls where region bypasses Session (direct function params or ARN parsing): common_utils.py (2), image_retriever.py (4), image_retriever_utils.py (1), image_uris.py (2), metrics_visualizer.py (2).
Issue:
SDK constructs endpoint URLs by interpolating the region parameter directly into URL strings without validation. A malicious region value can cause requests to be redirected to non-AWS hosts.
Fix:
Added a centralized region_validation module with strict regex validation for region format, and applied
validate_region()at all affected endpoint URL construction sites acrosssagemaker-core,sagemaker-serve, andsagemaker-train.Added
test_region_validation.pywith parametrized tests covering all known AWS regions and invalid/malicious inputs