Skip to content

Filter .cog/ from Docker build context, centralize build state#3000

Open
michaeldwan wants to merge 10 commits intomainfrom
md/dot-cog-build-cache
Open

Filter .cog/ from Docker build context, centralize build state#3000
michaeldwan wants to merge 10 commits intomainfrom
md/dot-cog-build-cache

Conversation

@michaeldwan
Copy link
Copy Markdown
Member

We want .cog/ to be the single local state directory for a Cog project -- build staging, weight caches, mount scratch space, advisory locks. But until now, everything under the project root got sent to the Docker daemon as build context, so putting anything substantial in .cog/ would bloat builds or require users to manually .dockerignore it. And the build system was already fighting this by mutating .dockerignore on disk mid-build to exclude weight directories -- a racy pattern that left backup files on crashes.

This PR makes .cog/ invisible to Docker builds at the BuildKit transport layer (fsutil.NewFilterFS with ExcludePatterns), so it never reaches the daemon regardless of what the user's .dockerignore says. That means we can now put weight caches, build artifacts, and whatever else in .cog/ without worrying about context size or user-facing configuration.

The other piece is pkg/dotcog, which owns the .cog/ directory lifecycle -- path accessors, a project-wide advisory lock (flock), and cleanup registration. Build staging artifacts move from .cog/tmp/build<timestamp>/ to .cog/build/ and are referenced via a cog_build named build context (COPY --from=cog_build), so the generated Dockerfile no longer needs relative paths back into the project tree.

What got removed along the way:

  • The .dockerignore backup/restore/rewrite dance and its pre-flight checkCompatibleDockerIgnore check
  • pkg/dockercontext/, pkg/dockerignore/, the NewGenerator factory wrapper
  • lockfile.WithLock (replaced by the project-wide dotcog.Dir.Lock)
  • The bad_dockerignore integration test (the error condition no longer exists)

The Dockerfile mount to BuildKit is also now scoped to just the Dockerfile file itself, preventing build artifacts from leaking through that mount.

Switch from LocalDirs to LocalMounts with fsutil filtering so .cog/ is
never sent to the Docker daemon as build context. This prevents weight
blobs, mount dirs, and stale tmp dirs from inflating context size.

- Add ExcludePatterns and BuildCacheDir to ImageBuildOptions
- Introduce cog_build named build context for staging artifacts
- Generated Dockerfiles use COPY --from=cog_build instead of
  project-relative paths into .cog/tmp/
- Replace timestamp-based .cog/tmp/build<ts>/ with stable .cog/build/
  so COPY paths are deterministic across builds (fixes layer cache busting)
- Move bundled schema and weights manifest into .cog/build/
- Remove checkCompatibleDockerIgnore (no longer needed -- .cog is
  excluded at the session level, users can freely ignore it)
- Write Dockerfile to BuildCacheDir when provided, eliminating a
  separate temp dir
Replace scattered filepath.Join + global.CogBuildArtifactsFolder patterns
with a structured dotcog.Dir that manages path resolution, directory
creation, advisory locking, and cleanup for the .cog/ project directory.

- Add pkg/dotcog/ with Open, OpenTemp, Path, FilePath, Lock, Close
- Wire dotcog.Dir through Source, Build, weights import, and debug
- Add Source.Close() and defer src.Close() in all CLI commands
- Lock API uses idiomatic Lock(ctx) (release, err) + defer release()
- Remove pkg/lockfile/, pkg/dockerignore/, dockercontext/build_tempdir
- Remove global.CogBuildArtifactsFolder (replaced by dotcog.Name)
- Replace hardcoded .cog/ path constants with buildPaths struct
- Add .cog to cog init .dockerignore template
- Simplify example .dockerignore files (exclude .cog/ entirely)
- Close() uses sync.Once for idempotent double-call safety
- OpenTemp registers temp dir removal via onClose instead of a temp flag
- Add TempPath() for scratch dirs (build/) that clean up on Close
- Lock() returns explicit error when !locked && ctx.Err() == nil
- Remove generator.Cleanup() -- dotcog owns .cog/build/ lifecycle
- Remove Cleanup() from Generator interface
- Unexport OnClose (no external callers), remove unused Remove()
- Fix stale package doc (WithLock -> Lock)
- Remove accidentally committed plans/ directory
- TempPath wipes existing contents before creating the directory,
  so stale artifacts from crashed builds don't leak into new ones
- Remove surgical os.Remove calls for individual build artifacts
  (TempPath's wipe handles it)
- Remove stale Cleanup() reference in field comment
- Remove editorial comment on Generator interface
- Fix struct literal alignment (gofmt)
- Delete pkg/dockercontext/ (only export was StandardBuildDirectory=".")
- Inline "." in baseimage.go
- Delete generator_factory.go and its test
- Inline NewStandardGenerator at 3 call sites (build.go, debug.go, base.go)
.cog/ is excluded at the fsutil session level via ExcludePatterns,
so .dockerignore entries are redundant for cog build.
…arate weights

Separate-weights builds no longer backup/write/restore .dockerignore on
disk. Weight file exclusions for the runner build are passed as fsutil
ExcludePatterns instead, matching how .cog/ is already excluded.

- Remove backupDockerignore, writeDockerignore, restoreDockerignore
- Remove makeDockerignoreForWeightsImage (weights Dockerfile uses
  explicit COPY paths, .dockerignore was irrelevant)
- Remove DockerignoreHeader constant and makeDockerignoreForWeights
- GenerateModelBaseWithSeparateWeights returns []string exclude patterns
  instead of a dockerignore string
- buildRunnerImage takes excludePatterns parameter
.cog/tmp/ no longer exists; wheels are staged in .cog/build/ and
referenced via the cog_build named build context.
- Merge buildWeightsImage + buildRunnerImage into buildContextImage
  (identical except for excludePatterns parameter)
- Use dotcog.Name constant instead of hardcoded ".cog" in
  bundleDockerfile and test files
- Document BuildCacheDir / BuildContexts["cog_build"] relationship
- Clarify Source.DotCog nil comment
@michaeldwan michaeldwan marked this pull request as ready for review May 5, 2026 23:27
@michaeldwan michaeldwan requested a review from a team as a code owner May 5, 2026 23:27
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 5, 2026

LGTM

github run

When cog predict/train/serve volume-mount the project dir at /src,
the host's .cog/ shadows the image's .cog/ layer. Coglet looks for
.cog/openapi_schema.json on disk, so it needs to exist on the host
filesystem -- not just in the image.

Write schema and weights manifest to both .cog/build/ (for the
cog_build named context used by COPY --from=cog_build) and .cog/
root (for volume-mount visibility).
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR centralizes .cog/ directory management and filters it from Docker build context at the BuildKit transport layer, eliminating the racy .dockerignore mutation dance.

Issues found (ranked by severity):

Medium: The new pkg/dotcog package lacks unit tests. This package handles file system operations, advisory locking with flock, and cleanup registration - all functionality that should be tested, especially the Lock() method's error handling and the onClose cleanup ordering.

Low: The defaultExcludePatterns in pkg/image/build.go:46 uses a bare .cog pattern which could theoretically match other directories like .cog-backup. Consider using .cog/ or an explicit pattern to avoid any accidental matches.

var defaultExcludePatterns = []string{dotcog.Name + "/"}

Note: The removal of the bad_dockerignore integration test is correct - since .cog/ is now filtered at the transport layer, the user-facing error condition no longer exists. However, consider documenting this behavior change for users who previously relied on manually excluding .cog/ in their .dockerignore.

Overall this is a solid refactor that solves real operational issues. The BuildKit-level filtering is much cleaner than the previous .dockerignore rewrite approach.

Comment thread pkg/dotcog/dotcog.go
@@ -0,0 +1,200 @@
// Package dotcog manages the .cog/ project directory.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new package lacks unit tests. Please add tests for:

  • Open() and OpenTemp() directory creation
  • Path() and TempPath() subdirectory handling
  • Lock() advisory locking behavior (including error paths)
  • onClose cleanup registration and LIFO ordering
  • Close() nil-safety and idempotency

The Lock() method in particular handles several edge cases (context cancellation, TryLockContext errors) that should be validated.

Comment thread pkg/image/build.go
// defaultExcludePatterns filters .cog/ out of the project context mount
// so weight blobs, mount dirs, and build caches are never sent to the
// Docker daemon.
var defaultExcludePatterns = []string{dotcog.Name}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a trailing slash to make the exclusion pattern more explicit:

Suggested change
var defaultExcludePatterns = []string{dotcog.Name}
var defaultExcludePatterns = []string{dotcog.Name + "/"}

This prevents any theoretical collision with directories like .cog-backup while still matching .cog/ and its contents.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 6, 2026

Review posted to PR #3000.

Summary of findings:

  1. Medium: New pkg/dotcog package needs unit tests (file operations, locking, cleanup)

  2. Low: defaultExcludePatterns should use .cog/ instead of .cog to avoid matching directories like .cog-backup

  3. Note: The bad_dockerignore test removal is correct - the error condition no longer exists since .cog/ is filtered at the transport layer

Overall solid refactor that eliminates the racy .dockerignore mutation pattern in favor of BuildKit-level filtering.

github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant