-
Notifications
You must be signed in to change notification settings - Fork 543
Add GPU support to kubernetes_scale on EKS Karpenter #6313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| apiVersion: karpenter.sh/v1 | ||
| kind: NodePool | ||
| metadata: | ||
| name: {{ gpu_nodepool_name | default('gpu') }} | ||
| spec: | ||
| disruption: | ||
| consolidateAfter: {{ gpu_consolidate_after | default('1m') }} | ||
| consolidationPolicy: {{ gpu_consolidation_policy | default('WhenEmptyOrUnderutilized') }} | ||
| limits: | ||
| cpu: {{ gpu_nodepool_cpu_limit | default(1000) }} | ||
| template: | ||
| metadata: | ||
| labels: | ||
| pkb_nodepool: {{ gpu_nodepool_label | default('gpu') }} | ||
| spec: | ||
| nodeClassRef: | ||
| group: karpenter.k8s.aws | ||
| kind: EC2NodeClass | ||
| name: {{ karpenter_ec2nodeclass_name | default('default') }} | ||
| requirements: | ||
| - key: kubernetes.io/arch | ||
| operator: In | ||
| values: {{ gpu_arch | default(['amd64']) }} | ||
| - key: kubernetes.io/os | ||
| operator: In | ||
| values: {{ gpu_os | default(['linux']) }} | ||
| - key: karpenter.sh/capacity-type | ||
| operator: In | ||
| values: {{ gpu_capacity_types | default(['on-demand']) }} | ||
| - key: karpenter.k8s.aws/instance-category | ||
| operator: In | ||
| values: {{ gpu_instance_categories | default(['g']) }} | ||
| - key: karpenter.k8s.aws/instance-family | ||
| operator: In | ||
| values: {{ gpu_instance_families | default(['g6','g6e']) }} | ||
| taints: | ||
| - key: {{ gpu_taint_key | default('nvidia.com/gpu') }} | ||
| effect: NoSchedule | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,13 +13,24 @@ spec: | |
| labels: | ||
| name: {{ Name }} | ||
| spec: | ||
| {%- if NvidiaGpuRequest and Cloud == 'aws' %} | ||
| nodeSelector: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zach recently changed how the nodeSelectors area applied. Take a look at #6291, and in particular, the (new) GetNodeSelectors() methods in the providers. (e.g. within providers/aws/elastic_kubernetes_service.py). That said, your nodeSelector here refers back to your aws-gpu-nodepool.yaml.j2 file, so I think the new GetNodeSelectors() method probably doesn't apply - keeping this in the yaml.j2 file seems like the right approach here. No action required (though take a quick look at that #6291.)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the pointer, I checked #6291.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I disagree here. A few reasons:
So all of this should instead just go into the EksKarpenterCluster class & its GetNodeSelectors + ModifyPodSpecPlacementYaml code. |
||
| karpenter.sh/nodepool: {{ GpuNodepoolName | default('gpu') }} | ||
| {%- endif %} | ||
| containers: | ||
| - name: {{ Name }} | ||
| image: {{ Image }} | ||
| {%- if Command %} | ||
| command: {{ Command }} | ||
| {%- endif %} | ||
| resources: | ||
| requests: | ||
| cpu: {{ CpuRequest }} | ||
| memory: {{ MemoryRequest }} | ||
| ephemeral-storage: {{ EphemeralStorageRequest }} | ||
| {%- if NvidiaGpuRequest %} | ||
| nvidia.com/gpu: {{ NvidiaGpuRequest }} | ||
| {%- endif %} | ||
| limits: | ||
| cpu: {{ CpuRequest }} | ||
| memory: {{ MemoryRequest }} | ||
|
|
@@ -53,3 +64,8 @@ spec: | |
| operator: "Exists" | ||
| effect: "NoExecute" | ||
| tolerationSeconds: {{ PodTimeout }} | ||
| {%- if NvidiaGpuRequest and Cloud == 'aws' %} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly, that same PR introduced ModifyPodSpecPlacyementYaml method, which seems like it could apply here. But again, this block refers back to your nodepool yaml.j2 file, so I think you should probably leave it as is. No action required.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the pointer and the context. |
||
| - key: {{ GpuTaintKey | default('nvidia.com/gpu') }} | ||
| operator: Exists | ||
| effect: NoSchedule | ||
| {%- endif %} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,8 +97,7 @@ def Prepare(bm_spec: benchmark_spec.BenchmarkSpec): | |
| def _GetRolloutCreationTime(rollout_name: str) -> int: | ||
| """Returns the time when the rollout was created.""" | ||
| out, _, _ = container_service.RunRetryableKubectlCommand([ | ||
| 'rollout', | ||
| 'history', | ||
| 'get', | ||
| rollout_name, | ||
| '-o', | ||
| 'jsonpath={.metadata.creationTimestamp}', | ||
|
|
@@ -180,8 +179,32 @@ def ScaleUpPods( | |
| max_wait_time = _GetScaleTimeout() | ||
| resource_timeout = max_wait_time + 60 * 5 # 5 minutes after waiting to avoid | ||
| # pod delete events from polluting data collection. | ||
| yaml_docs = cluster.ConvertManifestToYamlDicts( | ||
| MANIFEST_TEMPLATE, | ||
|
|
||
| # Ensure a GPU NodePool exists for EKS Karpenter before applying the workload. | ||
| is_eks_karpenter_aws_gpu = ( | ||
| virtual_machine.GPU_COUNT.value | ||
| and FLAGS.cloud.lower() == 'aws' | ||
| and getattr(cluster, 'CLUSTER_TYPE', None) == 'Karpenter' | ||
| ) | ||
|
|
||
| if is_eks_karpenter_aws_gpu: | ||
| cluster.ApplyManifest( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NB: Because But perhaps we should put this into the Prepare() function instead? (And if that works, maybe we should put the first invocation of ScaleUpPods there too... though if you want to do that, then a separate PR might be better.) |
||
| 'container/kubernetes_scale/aws-gpu-nodepool.yaml.j2', | ||
| gpu_nodepool_name='gpu', | ||
| gpu_nodepool_label='gpu', | ||
| karpenter_ec2nodeclass_name='default', | ||
| gpu_instance_categories=['g'], | ||
| gpu_instance_families=['g6', 'g6e'], | ||
| gpu_capacity_types=['on-demand'], | ||
| gpu_arch=['amd64'], | ||
| gpu_os=['linux'], | ||
| gpu_taint_key='nvidia.com/gpu', | ||
| gpu_consolidate_after='1m', | ||
| gpu_consolidation_policy='WhenEmptyOrUnderutilized', | ||
| gpu_nodepool_cpu_limit=1000, | ||
| ) | ||
|
|
||
| manifest_kwargs = dict( | ||
| Name='kubernetes-scaleup', | ||
| Replicas=num_new_pods, | ||
| CpuRequest=CPUS_PER_POD.value, | ||
|
|
@@ -192,12 +215,28 @@ def ScaleUpPods( | |
| EphemeralStorageRequest='10Mi', | ||
| RolloutTimeout=max_wait_time, | ||
| PodTimeout=resource_timeout, | ||
| Cloud=FLAGS.cloud.lower(), | ||
| ) | ||
| cluster.ModifyPodSpecPlacementYaml( | ||
| yaml_docs, | ||
| 'kubernetes-scaleup', | ||
| cluster.default_nodepool.machine_type, | ||
|
|
||
| if is_eks_karpenter_aws_gpu: | ||
| manifest_kwargs.update({ | ||
| 'GpuNodepoolName': 'gpu', | ||
| 'GpuTaintKey': 'nvidia.com/gpu', | ||
| }) | ||
|
|
||
| yaml_docs = cluster.ConvertManifestToYamlDicts( | ||
| MANIFEST_TEMPLATE, | ||
| **manifest_kwargs, | ||
| ) | ||
|
|
||
| # Non-GPU path: keep existing placement behavior. | ||
| if not is_eks_karpenter_aws_gpu: | ||
| cluster.ModifyPodSpecPlacementYaml( | ||
| yaml_docs, | ||
| 'kubernetes-scaleup', | ||
| cluster.default_nodepool.machine_type, | ||
| ) | ||
|
|
||
| resource_names = cluster.ApplyYaml(yaml_docs) | ||
|
|
||
| assert resource_names | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file a duplicate of https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/data/container/kubernetes_ai_inference/aws-gpu-nodepool.yaml.j2 ? It looks very similar.
Suggestions:
But really I think I'm looking for Cassandra Benchmark Missing Quotes #1 here.