feat(service): add content-addressed ModelStore with GC#37
Conversation
📊 Code Coverage Report
📦 Per-package breakdown |
There was a problem hiding this comment.
Code Review
This pull request introduces a node-local, content-addressed ModelStore to deduplicate model artifact pulls and reduce disk footprint using hardlinks. It also integrates this store into the Worker pull and delete logic, ensuring that cached data is reused across volumes and garbage collected when no longer referenced. Feedback highlights a critical compilation error due to a naming conflict between the standard library errors package and github.com/pkg/errors, specifically regarding the use of errors.Is. Additionally, it is suggested to initialize the modctlBackend once during Worker creation rather than on every digest resolution for better efficiency.
| "syscall" | ||
|
|
||
| "github.com/containerd/containerd/pkg/kmutex" | ||
| "github.com/pkg/errors" |
There was a problem hiding this comment.
There is a naming conflict between the standard library errors package and github.com/pkg/errors. The code uses errors.Is (from stdlib) at line 243, but only github.com/pkg/errors is imported as errors. This will cause a compilation error as github.com/pkg/errors does not contain the Is function. You should import the standard library errors package and alias github.com/pkg/errors to something like pkgerrors to avoid this conflict.
|
|
||
| if pullErr != nil { | ||
| switch { | ||
| case errors.Is(pullErr, context.Canceled): |
| type Worker struct { | ||
| cfg *config.Config | ||
| newPuller func(ctx context.Context, pullCfg *config.PullConfig, hook *status.Hook, diskQuotaChecker *DiskQuotaChecker) Puller | ||
| sm *status.StatusManager | ||
| inflight singleflight.Group | ||
| contextMap *ContextMap | ||
| kmutex kmutex.KeyedLocker | ||
| cfg *config.Config | ||
| newPuller func(ctx context.Context, pullCfg *config.PullConfig, hook *status.Hook, diskQuotaChecker *DiskQuotaChecker) Puller | ||
| sm *status.StatusManager | ||
| store *ModelStore | ||
| // resolveDigest returns the manifest digest of reference. Exposed as a | ||
| // field so tests can stub it without going through the network. | ||
| resolveDigest func(ctx context.Context, reference string) (string, error) | ||
| inflight singleflight.Group | ||
| contextMap *ContextMap | ||
| kmutex kmutex.KeyedLocker | ||
| } | ||
|
|
||
| func NewWorker(cfg *config.Config, sm *status.StatusManager) (*Worker, error) { | ||
| return &Worker{ | ||
| w := &Worker{ | ||
| cfg: cfg, | ||
| newPuller: NewPuller, | ||
| sm: sm, | ||
| store: NewModelStore(cfg.Get().RootDir), | ||
| inflight: singleflight.Group{}, | ||
| contextMap: NewContextMap(), | ||
| kmutex: kmutex.New(), | ||
| }, nil | ||
| } | ||
| w.resolveDigest = w.defaultResolveDigest | ||
| return w, nil | ||
| } |
There was a problem hiding this comment.
The defaultResolveDigest method creates a new modctlBackend instance on every call. If this backend performs any significant initialization or resource allocation, it is more efficient to create it once during Worker initialization and reuse it. This also simplifies the defaultResolveDigest implementation.
type Worker struct {
cfg *config.Config
newPuller func(ctx context.Context, pullCfg *config.PullConfig, hook *status.Hook, diskQuotaChecker *DiskQuotaChecker) Puller
sm *status.StatusManager
store *ModelStore
backend modctlBackend.Backend
// resolveDigest returns the manifest digest of reference. Exposed as a
// field so tests can stub it without going through the network.
resolveDigest func(ctx context.Context, reference string) (string, error)
inflight singleflight.Group
contextMap *ContextMap
kmutex kmutex.KeyedLocker
}
func NewWorker(cfg *config.Config, sm *status.StatusManager) (*Worker, error) {
b, err := modctlBackend.New("")
if err != nil {
return nil, errors.Wrap(err, "create modctl backend")
}
w := &Worker{
cfg: cfg,
newPuller: NewPuller,
sm: sm,
store: NewModelStore(cfg.Get().RootDir),
backend: b,
inflight: singleflight.Group{},
contextMap: NewContextMap(),
kmutex: kmutex.New(),
}
w.resolveDigest = w.defaultResolveDigest
return w, nil
}aeedfe7 to
9b90531
Compare
Signed-off-by: chlins <chlins.zhang@gmail.com>
Fixes: #35
This pull request introduces a node-local, content-addressed shared cache for model artifacts, called
ModelStore, and significantly refactors how volume directories are released and garbage collected. It also adds comprehensive tests for the new caching and GC behavior, ensuring robust integration with the worker logic. The changes improve deduplication of model pulls, minimize disk usage, and ensure that cached artifacts are garbage collected only when no longer referenced by any volume.Key changes:
Model Caching and Deduplication
ModelStoreinpkg/service/model_store.go, a shared, content-addressed cache for model artifacts that deduplicates concurrent pulls and exposes payloads to volumes via hardlinks. It uses per-digest locking and filesystem reference counting for concurrency and GC.Volume Release and Garbage Collection
pkg/service/node_dynamic.go,pkg/service/node_static_inline.go) to use a newReleaseVolumeTreemethod, which not only removes the volume directory but also triggers garbage collection of unreferenced cache entries. [1] [2]pkg/service/pullmodel_test.goto verify that cached model artifacts are shared across volumes, garbage collected when no longer referenced, and robust against various error conditions.Testing and Mocking Improvements
Minor Cleanups
osimport frompkg/service/node_static_inline.gofor clarity.These changes collectively provide a robust, efficient, and well-tested mechanism for model artifact caching, deduplication, and cleanup in the CSI driver.