Filter .cog/ from Docker build context, centralize build state#3000
Filter .cog/ from Docker build context, centralize build state#3000michaeldwan wants to merge 10 commits intomainfrom
Conversation
Switch from LocalDirs to LocalMounts with fsutil filtering so .cog/ is never sent to the Docker daemon as build context. This prevents weight blobs, mount dirs, and stale tmp dirs from inflating context size. - Add ExcludePatterns and BuildCacheDir to ImageBuildOptions - Introduce cog_build named build context for staging artifacts - Generated Dockerfiles use COPY --from=cog_build instead of project-relative paths into .cog/tmp/ - Replace timestamp-based .cog/tmp/build<ts>/ with stable .cog/build/ so COPY paths are deterministic across builds (fixes layer cache busting) - Move bundled schema and weights manifest into .cog/build/ - Remove checkCompatibleDockerIgnore (no longer needed -- .cog is excluded at the session level, users can freely ignore it) - Write Dockerfile to BuildCacheDir when provided, eliminating a separate temp dir
Replace scattered filepath.Join + global.CogBuildArtifactsFolder patterns with a structured dotcog.Dir that manages path resolution, directory creation, advisory locking, and cleanup for the .cog/ project directory. - Add pkg/dotcog/ with Open, OpenTemp, Path, FilePath, Lock, Close - Wire dotcog.Dir through Source, Build, weights import, and debug - Add Source.Close() and defer src.Close() in all CLI commands - Lock API uses idiomatic Lock(ctx) (release, err) + defer release() - Remove pkg/lockfile/, pkg/dockerignore/, dockercontext/build_tempdir - Remove global.CogBuildArtifactsFolder (replaced by dotcog.Name) - Replace hardcoded .cog/ path constants with buildPaths struct - Add .cog to cog init .dockerignore template - Simplify example .dockerignore files (exclude .cog/ entirely)
- Close() uses sync.Once for idempotent double-call safety - OpenTemp registers temp dir removal via onClose instead of a temp flag - Add TempPath() for scratch dirs (build/) that clean up on Close - Lock() returns explicit error when !locked && ctx.Err() == nil - Remove generator.Cleanup() -- dotcog owns .cog/build/ lifecycle - Remove Cleanup() from Generator interface - Unexport OnClose (no external callers), remove unused Remove() - Fix stale package doc (WithLock -> Lock) - Remove accidentally committed plans/ directory
- TempPath wipes existing contents before creating the directory, so stale artifacts from crashed builds don't leak into new ones - Remove surgical os.Remove calls for individual build artifacts (TempPath's wipe handles it) - Remove stale Cleanup() reference in field comment - Remove editorial comment on Generator interface - Fix struct literal alignment (gofmt)
- Delete pkg/dockercontext/ (only export was StandardBuildDirectory=".") - Inline "." in baseimage.go - Delete generator_factory.go and its test - Inline NewStandardGenerator at 3 call sites (build.go, debug.go, base.go)
.cog/ is excluded at the fsutil session level via ExcludePatterns, so .dockerignore entries are redundant for cog build.
…arate weights Separate-weights builds no longer backup/write/restore .dockerignore on disk. Weight file exclusions for the runner build are passed as fsutil ExcludePatterns instead, matching how .cog/ is already excluded. - Remove backupDockerignore, writeDockerignore, restoreDockerignore - Remove makeDockerignoreForWeightsImage (weights Dockerfile uses explicit COPY paths, .dockerignore was irrelevant) - Remove DockerignoreHeader constant and makeDockerignoreForWeights - GenerateModelBaseWithSeparateWeights returns []string exclude patterns instead of a dockerignore string - buildRunnerImage takes excludePatterns parameter
.cog/tmp/ no longer exists; wheels are staged in .cog/build/ and referenced via the cog_build named build context.
- Merge buildWeightsImage + buildRunnerImage into buildContextImage (identical except for excludePatterns parameter) - Use dotcog.Name constant instead of hardcoded ".cog" in bundleDockerfile and test files - Document BuildCacheDir / BuildContexts["cog_build"] relationship - Clarify Source.DotCog nil comment
|
LGTM |
When cog predict/train/serve volume-mount the project dir at /src, the host's .cog/ shadows the image's .cog/ layer. Coglet looks for .cog/openapi_schema.json on disk, so it needs to exist on the host filesystem -- not just in the image. Write schema and weights manifest to both .cog/build/ (for the cog_build named context used by COPY --from=cog_build) and .cog/ root (for volume-mount visibility).
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR centralizes .cog/ directory management and filters it from Docker build context at the BuildKit transport layer, eliminating the racy .dockerignore mutation dance.
Issues found (ranked by severity):
Medium: The new pkg/dotcog package lacks unit tests. This package handles file system operations, advisory locking with flock, and cleanup registration - all functionality that should be tested, especially the Lock() method's error handling and the onClose cleanup ordering.
Low: The defaultExcludePatterns in pkg/image/build.go:46 uses a bare .cog pattern which could theoretically match other directories like .cog-backup. Consider using .cog/ or an explicit pattern to avoid any accidental matches.
var defaultExcludePatterns = []string{dotcog.Name + "/"}
Note: The removal of the bad_dockerignore integration test is correct - since .cog/ is now filtered at the transport layer, the user-facing error condition no longer exists. However, consider documenting this behavior change for users who previously relied on manually excluding .cog/ in their .dockerignore.
Overall this is a solid refactor that solves real operational issues. The BuildKit-level filtering is much cleaner than the previous .dockerignore rewrite approach.
| @@ -0,0 +1,200 @@ | |||
| // Package dotcog manages the .cog/ project directory. | |||
There was a problem hiding this comment.
This new package lacks unit tests. Please add tests for:
Open()andOpenTemp()directory creationPath()andTempPath()subdirectory handlingLock()advisory locking behavior (including error paths)onClosecleanup registration and LIFO orderingClose()nil-safety and idempotency
The Lock() method in particular handles several edge cases (context cancellation, TryLockContext errors) that should be validated.
| // defaultExcludePatterns filters .cog/ out of the project context mount | ||
| // so weight blobs, mount dirs, and build caches are never sent to the | ||
| // Docker daemon. | ||
| var defaultExcludePatterns = []string{dotcog.Name} |
There was a problem hiding this comment.
Consider using a trailing slash to make the exclusion pattern more explicit:
| var defaultExcludePatterns = []string{dotcog.Name} | |
| var defaultExcludePatterns = []string{dotcog.Name + "/"} |
This prevents any theoretical collision with directories like .cog-backup while still matching .cog/ and its contents.
|
Review posted to PR #3000. Summary of findings:
Overall solid refactor that eliminates the racy |
We want
.cog/to be the single local state directory for a Cog project -- build staging, weight caches, mount scratch space, advisory locks. But until now, everything under the project root got sent to the Docker daemon as build context, so putting anything substantial in.cog/would bloat builds or require users to manually.dockerignoreit. And the build system was already fighting this by mutating.dockerignoreon disk mid-build to exclude weight directories -- a racy pattern that left backup files on crashes.This PR makes
.cog/invisible to Docker builds at the BuildKit transport layer (fsutil.NewFilterFSwithExcludePatterns), so it never reaches the daemon regardless of what the user's.dockerignoresays. That means we can now put weight caches, build artifacts, and whatever else in.cog/without worrying about context size or user-facing configuration.The other piece is
pkg/dotcog, which owns the.cog/directory lifecycle -- path accessors, a project-wide advisory lock (flock), and cleanup registration. Build staging artifacts move from.cog/tmp/build<timestamp>/to.cog/build/and are referenced via acog_buildnamed build context (COPY --from=cog_build), so the generated Dockerfile no longer needs relative paths back into the project tree.What got removed along the way:
.dockerignorebackup/restore/rewrite dance and its pre-flightcheckCompatibleDockerIgnorecheckpkg/dockercontext/,pkg/dockerignore/, theNewGeneratorfactory wrapperlockfile.WithLock(replaced by the project-widedotcog.Dir.Lock)bad_dockerignoreintegration test (the error condition no longer exists)The Dockerfile mount to BuildKit is also now scoped to just the Dockerfile file itself, preventing build artifacts from leaking through that mount.