Kubernetes manifests for deploying remote transcoding workers.
The GPU worker container is based on Rocky Linux 10 with:
- FFmpeg 7.1.2 from RPM Fusion (includes nvenc, vaapi, qsv encoders)
- intel-media-driver 25.2.6 (required for Intel Battlemage/Arc B580 support)
- Python 3.12
Image tags:
vlog-worker-gpu:rocky10- Rocky Linux 10 based GPU worker (recommended)vlog-worker-gpu:latest- Latest stable release
- A running Kubernetes cluster (k3s, k8s, etc.)
- VLog Worker API running and accessible from the cluster
- Container registry for worker images
# 1. Build and push the worker image
docker build -f Dockerfile.worker -t your-registry/vlog-worker:latest .
docker push your-registry/vlog-worker:latest
# 2. Register a worker to get an API key
curl -X POST http://your-vlog-server:9002/api/worker/register
# 3. Create the namespace
kubectl apply -f k8s/namespace.yaml
# 4. Update configmap.yaml with your Worker API URL
# Edit k8s/configmap.yaml and set VLOG_WORKER_API_URL
# 5. Create the secret with your API key
kubectl create secret generic vlog-worker-credentials \
--namespace vlog \
--from-literal=VLOG_WORKER_API_KEY=<your-api-key>
# 6. Deploy workers
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/worker-deployment.yaml
# 7. (Optional) Enable PodDisruptionBudget for high availability
kubectl apply -f k8s/worker-pdb.yaml
# 8. (Optional) Enable auto-scaling
kubectl apply -f k8s/worker-hpa.yamlnamespace.yaml- Creates thevlognamespaceconfigmap.yaml- Worker configuration (API URL, intervals)worker-deployment.yaml- CPU-only worker deploymentworker-deployment-nvidia.yaml- NVIDIA GPU worker deployment (NVENC)worker-deployment-intel.yaml- Intel Arc/QuickSync worker deployment (VAAPI)worker-hpa.yaml- Horizontal Pod Autoscaler for auto-scalingworker-pdb.yaml- PodDisruptionBudget for CPU workers (ensures minimum availability during disruptions)worker-pdb-nvidia.yaml- PodDisruptionBudget for NVIDIA GPU workersworker-pdb-intel.yaml- PodDisruptionBudget for Intel GPU workerscleanup-cronjob.yaml- CronJob for cleaning up stale transcoding jobsnetworkpolicy.yaml- NetworkPolicy restricting worker pod network access
PodDisruptionBudgets (PDBs) protect worker pods from being evicted simultaneously during voluntary disruptions. This ensures transcoding jobs aren't interrupted during:
- Node drains -
kubectl drainfor maintenance - Cluster autoscaling - When downscaling removes nodes
- Node upgrades - Rolling updates of cluster nodes
- Other voluntary disruptions - Planned maintenance events
Each worker type has a PDB configured with minAvailable: 1, ensuring at least one pod remains running during disruptions.
# Apply PDB for CPU workers
kubectl apply -f k8s/worker-pdb.yaml
# Apply PDB for NVIDIA GPU workers (if using GPU workers)
kubectl apply -f k8s/worker-pdb-nvidia.yaml
# Apply PDB for Intel GPU workers (if using GPU workers)
kubectl apply -f k8s/worker-pdb-intel.yaml
# Verify PDBs are active
kubectl get poddisruptionbudget -n vlog- PDBs only protect against voluntary disruptions, not involuntary ones (node failures, OOM kills, etc.)
- Requires at least 2 replicas - With only 1 replica, the PDB cannot be satisfied during disruptions
- Adjust
minAvailable- For production, considerminAvailable: 2or usemaxUnavailable: 1instead - GPU workers - PDBs prevent GPU resource contention during rolling updates
If you have critical transcoding requirements, consider:
# Option 1: Guarantee minimum capacity (50% of replicas)
spec:
minAvailable: 50%
# Option 2: Limit maximum disruption (only 1 pod at a time)
spec:
maxUnavailable: 1The networkpolicy.yaml restricts network access for worker pods to limit the blast radius if a pod is compromised. Workers only need:
- Egress to Worker API (port 9002) - For job claiming, progress updates, file transfers
- Egress to DNS (port 53) - For hostname resolution
- Optionally, egress to Redis (port 6379) - For instant job dispatch
All ingress is denied since workers don't need incoming connections.
NetworkPolicy requires a CNI that supports it. Common options:
- Calico - Full NetworkPolicy support
- Cilium - Full support with enhanced features
- Weave Net - Full support
Note: Default k3s/k8s networking does NOT enforce NetworkPolicy. Verify your CNI supports it before relying on this policy.
Before applying the policy, you must configure the Worker API egress rule. Edit networkpolicy.yaml and uncomment one of the options:
Option A: External Worker API - If your Worker API runs outside the cluster:
- to:
- ipBlock:
cidr: 192.168.1.100/32 # Replace with your API server's IP
ports:
- protocol: TCP
port: 9002Option B: In-cluster Worker API - If the API is deployed as a Kubernetes service:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: vlog
podSelector:
matchLabels:
app.kubernetes.io/name: vlog
app.kubernetes.io/component: worker-api
ports:
- protocol: TCP
port: 9002# Edit the policy to configure Worker API egress
vim k8s/networkpolicy.yaml
# Apply the network policy
kubectl apply -f k8s/networkpolicy.yaml
# Verify the policy is active
kubectl get networkpolicy -n vlogIf using Redis for instant job dispatch, uncomment one of the Redis egress options in the policy file and configure the appropriate CIDR or pod selector for your Redis deployment.
Important: Kubernetes secrets should never be committed to version control.
After registering a worker, create the secret directly:
# Register a worker to get an API key
# Note: VLOG_WORKER_ADMIN_SECRET must be set in your environment
vlog worker register --name "k8s-worker"
# Or via curl (include admin secret header):
curl -X POST http://your-vlog-server:9002/api/worker/register \
-H "Content-Type: application/json" \
-H "X-Admin-Secret: $VLOG_WORKER_ADMIN_SECRET" \
-d '{"worker_name": "k8s-worker"}'
# Create the secret (replace with your actual API key from registration response)
kubectl create secret generic vlog-worker-credentials \
--namespace vlog \
--from-literal=VLOG_WORKER_API_KEY="your-actual-api-key"To update an existing secret:
kubectl delete secret vlog-worker-credentials -n vlog
kubectl create secret generic vlog-worker-credentials \
--namespace vlog \
--from-literal=VLOG_WORKER_API_KEY="new-api-key"For production environments, consider using external secrets management:
- Sealed Secrets - Encrypt secrets that can be safely committed to Git
- External Secrets Operator - Sync secrets from AWS Secrets Manager, HashiCorp Vault, etc.
- HashiCorp Vault - Centralized secrets management with dynamic credentials
For faster transcoding, deploy GPU-enabled workers:
Prerequisites:
- NVIDIA GPU Operator installed
- Nodes with NVIDIA GPUs labeled
nvidia.com/gpu.present=true - NVIDIA RuntimeClass configured (check with
kubectl get runtimeclass)
# Build GPU-enabled image
docker build -f Dockerfile.worker.gpu -t your-registry/vlog-worker-gpu:latest .
docker push your-registry/vlog-worker-gpu:latest
# For k3s: Import image directly to containerd
docker save vlog-worker-gpu:latest | ssh user@node 'sudo k3s ctr images import -'
# Deploy NVIDIA GPU workers
kubectl apply -f k8s/worker-deployment-nvidia.yaml
# (Optional) Apply PodDisruptionBudget for NVIDIA workers
kubectl apply -f k8s/worker-pdb-nvidia.yamlImportant: The deployment uses runtimeClassName: nvidia which is required for GPU access. If your cluster uses a different runtime class name, update the deployment accordingly.
Supported encoders: h264_nvenc, hevc_nvenc, av1_nvenc (RTX 40 series only)
Note: Consumer NVIDIA GPUs have concurrent encode limits:
- RTX 4090/4080/4070: 5 sessions
- RTX 3090/3080/3070: 3 sessions
- Datacenter GPUs (A100, T4, etc.): Unlimited
Prerequisites:
- Intel GPU Device Plugin installed
- Nodes with Intel GPUs labeled
intel.feature.node.kubernetes.io/gpu=true - For Battlemage GPUs (Arc B580): Use the Rocky Linux 10 container image (requires intel-media-driver 25.x)
# Deploy Intel GPU workers
kubectl apply -f k8s/worker-deployment-intel.yaml
# (Optional) Apply PodDisruptionBudget for Intel workers
kubectl apply -f k8s/worker-pdb-intel.yamlSupported encoders: h264_vaapi, hevc_vaapi, av1_vaapi
Intel Arc GPUs have excellent AV1 encoding quality and support:
- Battlemage (B580): Requires intel-media-driver 25.x (Rocky Linux 10 image)
- Alchemist (A770, A380): Works with intel-media-driver 23.x+
Manual scaling:
kubectl scale deployment vlog-worker --namespace vlog --replicas=5Auto-scaling (requires metrics-server):
kubectl apply -f k8s/worker-hpa.yamlWorkers include an HTTP health server for Kubernetes liveness and readiness probes.
| Endpoint | Purpose | Success Criteria |
|---|---|---|
GET /health |
Liveness probe | Process is running |
GET /ready |
Readiness probe | FFmpeg available AND API connected |
GET / |
Info endpoint | Returns service info and API URL |
The health server runs on port 8080 by default, configurable via VLOG_WORKER_HEALTH_PORT:
# In configmap.yaml
data:
VLOG_WORKER_HEALTH_PORT: "8080"All worker deployment manifests include configured probes:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3The /ready endpoint verifies:
- FFmpeg Available: Checks
ffmpegis in PATH - API Connected: Worker has connected to Worker API and sent a heartbeat
Response example:
{
"status": "ready",
"checks": {
"ffmpeg": true,
"api_connected": true
}
}If any check fails, returns HTTP 503 (Service Unavailable).
# View worker pods
kubectl get pods -n vlog
# View worker logs
kubectl logs -n vlog -l app.kubernetes.io/component=worker -f
# View worker status
curl http://your-vlog-server:9002/api/workers
# Check individual pod health
kubectl exec -n vlog <pod-name> -- curl -s localhost:8080/readyWorkers are CPU-intensive during transcoding. Adjust resources in worker-deployment.yaml:
- Small videos (720p): 1 CPU, 2GB RAM
- HD videos (1080p): 2 CPU, 4GB RAM
- 4K videos (2160p): 4 CPU, 8GB RAM
GPU workers are less CPU-intensive. Adjust resources in worker-deployment-nvidia.yaml or worker-deployment-intel.yaml:
- All resolutions: 1-2 CPU, 4GB RAM, 1 GPU
- GPU encoding is 5-10x faster than CPU for equivalent quality
The work directory (emptyDir) should be sized to hold source + output:
- Small: 10GB
- Large/4K: 50GB+
VLog provides a CronJob for automated PostgreSQL backups.
Create the backup credentials secret:
kubectl create secret generic postgres-backup-credentials \
--namespace vlog \
--from-literal=PGHOST=your-postgres-host \
--from-literal=PGPORT=5432 \
--from-literal=PGDATABASE=vlog \
--from-literal=PGUSER=vlog \
--from-literal=PGPASSWORD=your-passwordkubectl apply -f k8s/backup-cronjob.yaml| Setting | Default | Description |
|---|---|---|
| Schedule | 0 2 * * * |
Daily at 2:00 AM UTC |
| Retention | 7 days | Backups older than this are deleted |
| Format | Custom (pg_dump -Fc) | Compressed, supports selective restore |
| Storage | /mnt/nas/vlog-storage/backups/ |
NAS-mounted backup directory |
The CronJob automatically verifies each backup using pg_restore --list. Failed verifications are logged and the corrupt file is removed.
# Trigger a backup job immediately
kubectl create job --from=cronjob/postgres-backup manual-backup-$(date +%s) -n vlog
# Check backup job status
kubectl get jobs -n vlog -l component=backup
# View backup logs
kubectl logs -n vlog -l component=backup --tail=100# List available backups
ls /mnt/nas/vlog-storage/backups/
# Restore a specific backup
pg_restore -U vlog -d vlog --clean /mnt/nas/vlog-storage/backups/vlog-2025-12-27-020000.dumpVLog Kubernetes deployments include several security features.
All worker containers run with restricted security contexts:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALLPod-level security is enforced:
- runAsNonRoot: Containers cannot run as root
- seccompProfile: Uses RuntimeDefault seccomp profile
- fsGroup: Ensures correct file permissions
- Pinned image tags: Use specific versions, not
latest - Image pull policy:
IfNotPresentprevents unexpected updates - Multi-stage builds: Production images contain only runtime dependencies
The networkpolicy.yaml restricts worker pod network access:
Allowed Egress:
- Worker API (port 9002)
- DNS (port 53)
- Redis (port 6379, if enabled)
Denied:
- All ingress
- All other egress
VLog's CI/CD pipeline includes security scanning:
- Trivy: Container image vulnerability scanning
- pip-audit: Python dependency vulnerability scanning
- Bandit: Python static security analysis
-
Secrets Management:
- Never commit secrets to git
- Use kubectl to create secrets
- Consider Sealed Secrets or External Secrets Operator for GitOps
-
RBAC:
- Workers use minimal RBAC permissions
- No cluster-wide permissions required
-
Resource Limits:
- All pods have memory and CPU limits
- Prevents resource exhaustion attacks
-
Audit Logging:
- Enable Kubernetes audit logging
- VLog application audit logs complement cluster logs
Worker pods expose metrics for monitoring.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: vlog-workers
namespace: vlog
spec:
selector:
matchLabels:
app.kubernetes.io/name: vlog-worker
podMetricsEndpoints:
- port: health
path: /metrics
interval: 30sOr with static Prometheus configuration:
scrape_configs:
- job_name: 'vlog-workers'
kubernetes_sd_configs:
- role: pod
namespaces:
names: ['vlog']
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
regex: vlog-worker
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_name]
regex: health
action: keepSee MONITORING.md for available metrics and alerting rules