refactor(core): add worker-pool for containerd event processing #2294
+177
−59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose of PR?:
Refactor the containerd handler to process containerd events via a bounded worker-pool and introduce a small PID/namespace cache to improve event throughput and reduce latency under bursty container workloads, without changing existing behavior.
Fixes #2289
Does this PR introduce a breaking change?
No.
The external behavior of containerd handling, container maps, endpoints, and policy enforcement remains unchanged.
Only the internal execution model for containerd events is changed from synchronous to worker-pool based processing.
If the changes in this PR are manually verified, list down the scenarios covered::
Non-k8s mode with containerd:
-k8s=false-logPath=stdoutCRI_SOCKET=unix:///run/k3s/containerd/containerd.sock(via env)Started to monitor Containerd events (worker-pool mode)Container lifecycle via containerd (
ctr):sudo ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images pull docker.io/library/alpine:latestsudo ctr --address /run/k3s/containerd/containerd.sock -n k8s.io run docker.io/library/alpine:latest ct-test-1 sleep 3/tasks/start,/tasks/exit,/containers/delete.[containerd-worker] handling event: topic=/tasks/start[containerd-worker] handling event: topic=/tasks/exit[containerd-worker] handling event: topic=/containers/deleteDetected a container (added/...)Detected a container (removed/...)containerd events: queued=<N> processed=<M> busy=<K>PID / Namespace correctness:
Pid,PidNS,MntNSare populated correctly via:getPrimaryPidAndNSCached) wheneventpid == 0/proc/<pid>/ns/{pid,mnt}lookup wheneventpid != 0/proc).Stability & error handling:
recoveraroundprocessContainerdJob.UpdateContainerdContainer(..., "destroy")is only invoked when the exiting PID matches the tracked container PID (same logic as before).Additional information for reviewer? :
/procnamespace lookups when events arrive in bursts.[containerd-worker] ...logs and periodic queue metrics were added to make it easier to profile and monitor behavior in real environments.Checklist:
<type>(<scope>): <subject>