-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add aws batch #5409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add aws batch #5409
Conversation
| "## Create Sample Resources\n", | ||
| "The diagram belows shows the Batch resources we'll create for this example.\n", | ||
| "\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing the png here from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, added the png back
| " time.sleep(5)\n", | ||
| "\n", | ||
| " # Print training job logs\n", | ||
| " # job.get_estimator().logs()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this Estimator comment reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops good catch!
| # Step 3: Create ModelTrainer | ||
| model_trainer = ModelTrainer(**init_params) | ||
|
|
||
| # Step 4: Set _latest_training_job (key insight!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: key insight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was a note for myself (the ModelTrainer parameter that I could attach the training job to)!

Issue #, if available:
Description of changes:
Introducing aws_batch into PySDK V3, placed in sagemaker.train.
Supports queueing in front of training jobs with interfaces TrainingQueue and TrainingQueuedJob.
Includes unit tests, integration tests, and an example notebook
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.