Skip to content

Conversation

@aviruthen
Copy link
Collaborator

Issue #, if available:

Description of changes:

Introducing aws_batch into PySDK V3, placed in sagemaker.train.
Supports queueing in front of training jobs with interfaces TrainingQueue and TrainingQueuedJob.
Includes unit tests, integration tests, and an example notebook

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aviruthen
Copy link
Collaborator Author

Unit and integ tests are passing, have to rerun for recent merge
Screenshot 2025-12-12 at 11 16 58 AM

"## Create Sample Resources\n",
"The diagram belows shows the Batch resources we'll create for this example.\n",
"\n",
"![The Resources to Create](batch_getting_started_resources.png \"Example Job Queue and Service Environment Resources\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing the png here from this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, added the png back

" time.sleep(5)\n",
"\n",
" # Print training job logs\n",
" # job.get_estimator().logs()\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this Estimator comment reference

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops good catch!

# Step 3: Create ModelTrainer
model_trainer = ModelTrainer(**init_params)

# Step 4: Set _latest_training_job (key insight!)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: key insight?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was a note for myself (the ModelTrainer parameter that I could attach the training job to)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants