Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 679 Bytes

File metadata and controls

20 lines (12 loc) · 679 Bytes

CreateDeploymentRequest

Deployment an AI model onto a set of GPUs

Properties

Name Type Description Notes
gpuCount Long Number of GPUs (1-8)
inferenceEngineVersion InferenceEngineVersion [optional]
name String Deployment name
gpuType String GPU type family (e.g., gpua5000, gpu3080ti)
replicas Long Number of replicas (>=1)
inferenceEngineParameters List<String> Optional extra inference engine server CLI args [optional]
model ModelRef