Skip to content

Conversation

@vofish
Copy link
Collaborator

@vofish vofish commented Dec 15, 2025

Add support for kubernetes_scale to 1 GPU on AKS Standard:

  • Update container/kubernetes_scale/kubernetes_scale.yaml.j2 to intall the NVIDIA device plugin
  • Add cloud yaml_docs manifest

Command to run:

./pkb.py --cloud=Azure --benchmarks=kubernetes_scale \
--config_override='kubernetes_scale.container_cluster.vm_spec.Azure.zone="westus2"' \
--config_override='kubernetes_scale.container_cluster.type="Kubernetes"' \
--config_override=kubernetes_scale.container_cluster.max_vm_count=4 \
--config_override='kubernetes_scale.container_cluster.vm_spec.Azure.machine_type="Standard_NC8as_T4_v3"' \
--gpu_count=1 --gpu_type=t4 \
--kubernetes_scale_num_replicas=2 --kubernetes_scale_pod_cpus=4 --kubernetes_scale_pod_memory=4G \
--kubernetes_scale_report_individual_latencies=True --kubernetes_scale_report_latency_percentiles=False \
--metadata=cloud:Azure  --timeout_minutes=236

Copy link

@rsgowman rsgowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, though let's get Zach to review too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants