Skip to content

James/security enhancements CEL policy update#83

Open
JPurcell-Braintrust wants to merge 3 commits intomainfrom
james/security-enhancements
Open

James/security enhancements CEL policy update#83
JPurcell-Braintrust wants to merge 3 commits intomainfrom
james/security-enhancements

Conversation

@JPurcell-Braintrust
Copy link
Copy Markdown

Adding in the previous security enhancements to fix CEL based policy breaches, previously based on 1.1.32 to the latest helm updates. The security enhancements were:

  • add securityContext and podSecurityContext to all 3 pod types
  • readOnlyRootFilesystem
  • emptyDir size limits for Brainstore volumes

An example/google-autopilot-cel/ has been created to help the known customers currently needing these CEL enhancements in their production environment.

@soldatchenko
Copy link
Copy Markdown
Contributor

Overall this looks good to me. My only asks are around runtime validation rather than the Helm config itself

  • confirm the read-only root filesystem with representative API/scorer + Brainstore trace paths
  • make sure the Brainstore emptyDir.sizeLimit leaves enough headroom above the configured cache size in the example

- ALL
volume:
size: "1000Gi"
sizeLimit: "900Gi"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on sizing

objectStoreCacheFileSize is also 900Gi above, so this leaves no headroom inside the emptyDir limit for filesystem overhead, temp files, partial writes, or Brainstore metadata. Should sizeLimit match the requested volume.size (1000Gi), or should objectStoreCacheFileSize be set lower than sizeLimit?

same applies to fastreader/writer below

objectStoreCacheFileSize: "900Gi"
verbose: true
securityContext:
readOnlyRootFilesystem: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache path is still writable via /mnt/tmp/brainstore, so this looks structurally good. Can we confirm with a runtime smoke test that Brainstore does not write temp/cache files outside cacheDir when readOnlyRootFilesystem is enabled?

i.e.

kubectl -n braintrust exec deploy/brainstore-reader -- sh -c 'touch /mnt/tmp/brainstore/smoke && rm /mnt/tmp/brainstore/smoke'
kubectl -n braintrust exec deploy/braintrust-api -- sh -c 'touch /tmp/smoke && rm /tmp/smoke'

then run one product level through the API

  1. create/write one trace
  2. read/query it back
  3. run one eval or scorer/code-function path if code execution is enabled
  • check API logs/events for EROFS, read-only file system, permission denied, or No space left on device? That should catch whether Brainstore writes anywhere outside cacheDir at runtime

cpu: "4"
memory: "8Gi"
securityContext:
readOnlyRootFilesystem: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we smoke this with product created custom code scorers rather than only pod-level checks? Maybe:

  1. a trivial Python/TypeScript scorer that returns 1.0
  2. a trace-level scorer that calls trace.get_spans() / trace.getSpans()

This should exercise the actual scorer sandbox startup path and the trace/object-fetch path that would surface runtime filesystem assumptions under readOnlyRootFilesystem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants