Skip to content

Commit 3a564e1

Browse files
feat(helm): Add GPU support for MedCAT Service Helm Chart (#35)
- Updated README.md to include instructions for GPU-enabled deployment. - Modified values.yaml to support GPU configuration options. - Added runtime class name handling in deployment templates for both MedCAT Service and Trainer.
1 parent d3fa72c commit 3a564e1

File tree

5 files changed

+58
-1
lines changed

5 files changed

+58
-1
lines changed

deployment/kubernetes/charts/medcat-service-helm/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,42 @@ env:
7171
DEID_MODE: "true"
7272
DEID_REDACT: "true"
7373
```
74+
75+
76+
## GPU Support
77+
78+
To run MedCAT Service with GPU acceleration, use the GPU-enabled image and set the pod runtime class accordingly.
79+
80+
Note GPU support is only used for deidentification
81+
82+
Create a values file like `values-gpu.yaml` with the following content:
83+
84+
```yaml
85+
image:
86+
repository: ghcr.io/cogstack/medcat-service-gpu
87+
88+
runtimeClassName: nvidia
89+
90+
resources:
91+
limits:
92+
nvidia.com/gpu: 1
93+
env:
94+
APP_CUDA_DEVICE_COUNT: 1
95+
APP_TORCH_THREADS: -1
96+
DEID_MODE: true
97+
```
98+
99+
> To use GPU acceleration, your Kubernetes cluster should be configured with the NVIDIA GPU Operator or the following components:
100+
> - [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin)
101+
> - [NVIDIA GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery)
102+
> - The [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/)
103+
104+
### Test GPU support
105+
You can verify that the MedCAT Service pod has access to the GPU by executing `nvidia-smi` inside the pod.
106+
107+
108+
```sh
109+
kubectl exec -it <POD_NAME> -- nvidia-smi
110+
```
111+
112+
You should see the NVIDIA GPU device listing if the GPU is properly accessible.

deployment/kubernetes/charts/medcat-service-helm/templates/deployment.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,6 @@ spec:
134134
tolerations:
135135
{{- toYaml . | nindent 8 }}
136136
{{- end }}
137+
{{- if .Values.runtimeClassName }}
138+
runtimeClassName: {{ .Values.runtimeClassName | quote }}
139+
{{- end }}

deployment/kubernetes/charts/medcat-service-helm/values.yaml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ replicaCount: 1
88
# This sets the container image more information can be found here: https://kubernetes.io/docs/concepts/containers/images/
99
image:
1010
repository: cogstacksystems/medcat-service
11+
# repository: cogstacksystems/medcat-service-gpu
1112
# This sets the pull policy for images.
1213
# pullPolicy: IfNotPresent
1314
pullPolicy: Always
@@ -32,7 +33,7 @@ env:
3233
# DEID_REDACT: true
3334

3435
# Set SERVER_GUNICORN_MAX_REQUESTS to a high number instead of the default 1000. Trust k8s instead to restart pod when needed.
35-
SERVER_GUNICORN_MAX_REQUESTS: 100000
36+
SERVER_GUNICORN_MAX_REQUESTS: "100000"
3637

3738
# Recommended env vars to set to try to limit to 1 CPU for scaling
3839
# OMP_NUM_THREADS: "1"
@@ -44,6 +45,10 @@ env:
4445
# PYTORCH_ENABLE_MPS_FALLBACK: "1"
4546
# SERVER_GUNICORN_EXTRA_ARGS: "--worker-connections 1 --backlog 1"
4647

48+
# Recommended env vars for GPU support
49+
# APP_CUDA_DEVICE_COUNT: "1"
50+
# APP_TORCH_THREADS: "-1"
51+
4752
# Observability Env Vars
4853
APP_ENABLE_METRICS: true
4954
APP_ENABLE_TRACING: false
@@ -203,6 +208,10 @@ volumeMounts: []
203208
# mountPath: "/etc/foo"
204209
# readOnly: true
205210

211+
# Runtime class name for the pod (e.g., "nvidia" for GPU workloads)
212+
# More information: https://kubernetes.io/docs/concepts/containers/runtime-class/
213+
runtimeClassName: ""
214+
206215
nodeSelector: {}
207216

208217
tolerations: []

deployment/kubernetes/charts/medcat-trainer-helm/templates/medcat-trainer-deployment.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,6 @@ spec:
134134
tolerations:
135135
{{- toYaml . | nindent 8 }}
136136
{{- end }}
137+
{{- if .Values.runtimeClassName }}
138+
runtimeClassName: {{ .Values.runtimeClassName | quote }}
139+
{{- end }}

deployment/kubernetes/charts/medcat-trainer-helm/values.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,3 +254,6 @@ nodeSelector: {}
254254
tolerations: []
255255

256256
affinity: {}
257+
258+
# Runtime class name for the pod (e.g., "nvidia" for GPU workloads)
259+
runtimeClassName: ""

0 commit comments

Comments
 (0)