Require --instance-type when specifying accelerator resources (#317)#393
Open
FarhanTejani wants to merge 1 commit intoaws:mainfrom
Open
Require --instance-type when specifying accelerator resources (#317)#393FarhanTejani wants to merge 1 commit intoaws:mainfrom
FarhanTejani wants to merge 1 commit intoaws:mainfrom
Conversation
| spec = template.get('spec', {}) | ||
| node_selector = spec.get('nodeSelector', {}) | ||
| instance_type = node_selector.get(INSTANCE_TYPE_LABEL) if node_selector else None | ||
| if not instance_type: |
Collaborator
There was a problem hiding this comment.
If instance_type is None, the method returns None before ever reaching the new check at line 152. Move your validation to earlier.
3 tasks
Collaborator
|
Can you reenable these unit tests as part of your fix: They were commented out previously for a merge conflict resolution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's changing and why?
Closes #317 (the
--node-countmutual exclusivity part was fixed in #383)When
--acceleratorsor--accelerators-limitis specified without--instance-type, the CLI silently producesnvidia.com/gpu: 0in the k8s resource spec instead of the user's requested value. This happens because_set_default_accelerators_valneeds the instance type to determine the accelerator key (nvidia.com/gpuvsaws.amazon.com/neuroncore), and when it can't, it silently returnsNonefor both values.This change adds an early validation that raises a clear error when accelerators are specified without
--instance-type, and removes the dead code path in_get_limitsthat hardcodednvidia.com/gpu: 0.Before/After UX
Before:
After:
How was this change tested?
Are unit tests added?
Updated 3 tests that asserted the old behavior (
nvidia.com/gpu: 0on CPU-only / invalid instances)Are integration tests added?
N/A
Reviewer Guidelines
One of the following must be true: