Skip to content

Write ContextCache atomically with owner-only file and directory mode#69070

Open
co-cy wants to merge 1 commit intosaltstack:3006.xfrom
co-cy:security-context-cache-chmod
Open

Write ContextCache atomically with owner-only file and directory mode#69070
co-cy wants to merge 1 commit intosaltstack:3006.xfrom
co-cy:security-context-cache-chmod

Conversation

@co-cy
Copy link
Copy Markdown

@co-cy co-cy commented May 6, 2026

What does this PR do?

salt.utils.cache.ContextCache.cache_context opens the cache file through salt.utils.files.fopen(self.cache_path, "w+b") and lets the process umask pick the file mode; the parent directory is created via plain os.mkdir() with no mode argument. On default Linux installs the file ends up 0o644 (world-readable) inside a 0o755 directory.

The cache holds pillar context — pillar values plus surrounding state, including credentials sourced from external pillar backends (vault, git_pillar, consul). With the default modes any local user can:

  • ls /var/cache/salt/master/context/ → see filenames like salt.loaded.ext.pillar.vault.p, salt.loaded.ext.pillar.git_pillar.p — leaks which external-pillar backends and modules are in use;
  • cat <file> → read the credentials directly.

A separate but related rough edge: the same code path rewrites the cache in place via "w+b" mode, which truncates the file before writing back. A concurrent reader can observe a half-written file and fail to deserialize.

This PR makes two changes:

  1. The file write goes through tempfile.mkstemp + os.replace. mkstemp creates the temporary file with mode 0o600 by default (per its documented contract: "the file is readable and writable only by the creating user ID"); os.replace atomically renames it into place. The final cache file therefore has the correct mode from the first byte, and concurrent readers never see a partial write.
  2. The parent context/ directory is created with mode=stat.S_IRWXU (0o700).

What issues does this PR fix or reference?

Fixes #69069

Previous Behavior

def cache_context(self, context):
    if not os.path.isdir(os.path.dirname(self.cache_path)):
        os.mkdir(os.path.dirname(self.cache_path))     # 0o755 on default umask
    with salt.utils.files.fopen(self.cache_path, "w+b") as cache:
        salt.payload.dump(context, cache)              # 0o644 on default umask
$ ls -ld /var/cache/salt/master/context/
drwxr-xr-x 2 salt salt  context/
$ ls -l /var/cache/salt/master/context/*.p
-rw-r--r-- 1 salt salt  jinja2.p

New Behavior

def cache_context(self, context):
    parent = os.path.dirname(self.cache_path)
    if not os.path.isdir(parent):
        os.mkdir(parent, mode=stat.S_IRWXU)            # 0o700
    fd, tmp_path = tempfile.mkstemp(dir=parent, prefix=".tmp_")
    try:
        with os.fdopen(fd, "wb") as cache:
            salt.payload.dump(context, cache)
        os.replace(tmp_path, self.cache_path)          # atomic, inherits mkstemp's 0o600
    except BaseException:
        with contextlib.suppress(OSError):
            os.unlink(tmp_path)
        raise
$ ls -ld /var/cache/salt/master/context/
drwx------ 2 salt salt  context/
$ ls -l /var/cache/salt/master/context/*.p
-rw------- 1 salt salt  jinja2.p

Migration note

Existing context/ directories from earlier installs are left untouched — auto-changing their mode would be surprising (some operators may have deliberately set them differently for monitoring tools). On upgrade, operators can run chmod 0700 on their cache context directory once. New installs and freshly-created directories get the right mode automatically.

Crash-safety note

If the process is killed (SIGKILL / OOM / power loss) between mkstemp and os.replace, the .tmp_* file remains on disk. It is mode 0o600 so this is operational hygiene rather than a security issue; operators can clean stale temporaries with find <cachedir>/context/ -name '.tmp_*' -mtime +1 -delete.

Merge requirements satisfied?

  • Docs (no user-facing config or API change; not required)
  • Changelog — changelog/69069.fixed.md
  • Tests written/updated — see tests/pytests/unit/utils/test_cache.py:
    • test_cache_context_writes_with_owner_only_mode — file is 0o600.
    • test_cache_context_creates_parent_directory_with_owner_only_mode — parent is 0o700.
    • test_cache_context_overwrites_atomically_via_replace — successive writes use distinct inodes (proves atomic rename rather than in-place truncate).

Commits signed with GPG?

No.

`salt.utils.cache.ContextCache.cache_context` opened the cache file
through `salt.utils.files.fopen(self.cache_path, "w+b")` and let the
process umask pick the file mode; the parent directory was likewise
created via plain `os.mkdir()` with no mode argument. On default
Linux installs the file ended up `0o644` (world-readable) inside a
`0o755` directory.

Pillar context can carry credentials (passwords, vault tokens, API
keys sourced from external pillar backends), so any local user could
read them out of the file. Even after the file mode is tightened the
parent directory remained world-traversable, so a local user could
still `ls` the cache and learn which modules and external-pillar
backends were in use (filenames look like
``salt.loaded.ext.<module>.<name>.p``).

Two changes:

1. Switch the file write to `tempfile.mkstemp` + `os.replace`. mkstemp
   creates the temporary file with mode `0o600` by default (per its
   documented contract: "the file is readable and writable only by
   the creating user ID"); `os.replace` atomically renames it into
   place, so the final cache file inherits the restricted mode from
   the first byte and never sits on disk world-readable. The atomic
   rename also closes a separate but related rough edge: the
   previous `fopen(..., "w+b")` truncated the file before writing it
   back, so a concurrent reader could observe a half-written or
   empty file. After this change readers see either the old content
   or the new one -- never a partial write.

2. Pass `mode=stat.S_IRWXU` (`0o700`) to `os.mkdir` for the parent
   `context/` directory. On default umask this would have been
   `0o755`. Existing directories from earlier installs are left
   untouched -- operators upgrading may want to `chmod 0700` their
   cache context directory once.

Note on hard crashes (`SIGKILL` / OOM / power loss) between
`mkstemp` and `os.replace`: the `.tmp_*` file remains on disk. It
is mode `0o600` so the leak is operational hygiene, not security;
operators can clean stale temporaries with e.g.
`find <cachedir>/context/ -name '.tmp_*' -mtime +1 -delete`.

Add three behavioural tests:

- file mode is `0o600` after write,
- parent directory is created with mode `0o700`,
- successive writes use distinct inodes (proving atomic rename
  rather than in-place truncate).

Refs: saltstack#69069
@co-cy co-cy requested a review from a team as a code owner May 6, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant