File tree Expand file tree Collapse file tree 5 files changed +58
-1
lines changed
deployment/kubernetes/charts Expand file tree Collapse file tree 5 files changed +58
-1
lines changed Original file line number Diff line number Diff line change 7171 DEID_MODE: "true"
7272 DEID_REDACT: "true"
7373```
74+
75+
76+ ## GPU Support
77+
78+ To run MedCAT Service with GPU acceleration, use the GPU-enabled image and set the pod runtime class accordingly.
79+
80+ Note GPU support is only used for deidentification
81+
82+ Create a values file like `values-gpu.yaml` with the following content:
83+
84+ ```yaml
85+ image:
86+ repository: ghcr.io/cogstack/medcat-service-gpu
87+
88+ runtimeClassName: nvidia
89+
90+ resources:
91+ limits:
92+ nvidia.com/gpu: 1
93+ env:
94+ APP_CUDA_DEVICE_COUNT: 1
95+ APP_TORCH_THREADS: -1
96+ DEID_MODE: true
97+ ```
98+
99+ > To use GPU acceleration, your Kubernetes cluster should be configured with the NVIDIA GPU Operator or the following components:
100+ > - [ NVIDIA device plugin for Kubernetes] ( https://github.com/NVIDIA/k8s-device-plugin )
101+ > - [ NVIDIA GPU Feature Discovery] ( https://github.com/NVIDIA/gpu-feature-discovery )
102+ > - The [ NVIDIA Container Toolkit] ( https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/ )
103+
104+ ### Test GPU support
105+ You can verify that the MedCAT Service pod has access to the GPU by executing ` nvidia-smi ` inside the pod.
106+
107+
108+ ``` sh
109+ kubectl exec -it < POD_NAME> -- nvidia-smi
110+ ```
111+
112+ You should see the NVIDIA GPU device listing if the GPU is properly accessible.
Original file line number Diff line number Diff line change @@ -134,3 +134,6 @@ spec:
134134 tolerations :
135135 {{- toYaml . | nindent 8 }}
136136 {{- end }}
137+ {{- if .Values.runtimeClassName }}
138+ runtimeClassName : {{ .Values.runtimeClassName | quote }}
139+ {{- end }}
Original file line number Diff line number Diff line change @@ -8,6 +8,7 @@ replicaCount: 1
88# This sets the container image more information can be found here: https://kubernetes.io/docs/concepts/containers/images/
99image :
1010 repository : cogstacksystems/medcat-service
11+ # repository: cogstacksystems/medcat-service-gpu
1112 # This sets the pull policy for images.
1213 # pullPolicy: IfNotPresent
1314 pullPolicy : Always
3233 # DEID_REDACT: true
3334
3435 # Set SERVER_GUNICORN_MAX_REQUESTS to a high number instead of the default 1000. Trust k8s instead to restart pod when needed.
35- SERVER_GUNICORN_MAX_REQUESTS : 100000
36+ SERVER_GUNICORN_MAX_REQUESTS : " 100000"
3637
3738 # Recommended env vars to set to try to limit to 1 CPU for scaling
3839 # OMP_NUM_THREADS: "1"
4445 # PYTORCH_ENABLE_MPS_FALLBACK: "1"
4546 # SERVER_GUNICORN_EXTRA_ARGS: "--worker-connections 1 --backlog 1"
4647
48+ # Recommended env vars for GPU support
49+ # APP_CUDA_DEVICE_COUNT: "1"
50+ # APP_TORCH_THREADS: "-1"
51+
4752 # Observability Env Vars
4853 APP_ENABLE_METRICS : true
4954 APP_ENABLE_TRACING : false
@@ -203,6 +208,10 @@ volumeMounts: []
203208# mountPath: "/etc/foo"
204209# readOnly: true
205210
211+ # Runtime class name for the pod (e.g., "nvidia" for GPU workloads)
212+ # More information: https://kubernetes.io/docs/concepts/containers/runtime-class/
213+ runtimeClassName : " "
214+
206215nodeSelector : {}
207216
208217tolerations : []
Original file line number Diff line number Diff line change @@ -134,3 +134,6 @@ spec:
134134 tolerations :
135135 {{- toYaml . | nindent 8 }}
136136 {{- end }}
137+ {{- if .Values.runtimeClassName }}
138+ runtimeClassName : {{ .Values.runtimeClassName | quote }}
139+ {{- end }}
Original file line number Diff line number Diff line change @@ -254,3 +254,6 @@ nodeSelector: {}
254254tolerations : []
255255
256256affinity : {}
257+
258+ # Runtime class name for the pod (e.g., "nvidia" for GPU workloads)
259+ runtimeClassName : " "
You can’t perform that action at this time.
0 commit comments