James/security enhancements CEL policy update#83
James/security enhancements CEL policy update#83JPurcell-Braintrust wants to merge 3 commits intomainfrom
Conversation
|
Overall this looks good to me. My only asks are around runtime validation rather than the Helm config itself
|
| - ALL | ||
| volume: | ||
| size: "1000Gi" | ||
| sizeLimit: "900Gi" |
There was a problem hiding this comment.
Question on sizing
objectStoreCacheFileSize is also 900Gi above, so this leaves no headroom inside the emptyDir limit for filesystem overhead, temp files, partial writes, or Brainstore metadata. Should sizeLimit match the requested volume.size (1000Gi), or should objectStoreCacheFileSize be set lower than sizeLimit?
same applies to fastreader/writer below
| objectStoreCacheFileSize: "900Gi" | ||
| verbose: true | ||
| securityContext: | ||
| readOnlyRootFilesystem: true |
There was a problem hiding this comment.
The cache path is still writable via /mnt/tmp/brainstore, so this looks structurally good. Can we confirm with a runtime smoke test that Brainstore does not write temp/cache files outside cacheDir when readOnlyRootFilesystem is enabled?
i.e.
kubectl -n braintrust exec deploy/brainstore-reader -- sh -c 'touch /mnt/tmp/brainstore/smoke && rm /mnt/tmp/brainstore/smoke'
kubectl -n braintrust exec deploy/braintrust-api -- sh -c 'touch /tmp/smoke && rm /tmp/smoke'
then run one product level through the API
- create/write one trace
- read/query it back
- run one eval or scorer/code-function path if code execution is enabled
- check API logs/events for
EROFS,read-only file system,permission denied, orNo space left on device? That should catch whether Brainstore writes anywhere outsidecacheDirat runtime
| cpu: "4" | ||
| memory: "8Gi" | ||
| securityContext: | ||
| readOnlyRootFilesystem: true |
There was a problem hiding this comment.
Can we smoke this with product created custom code scorers rather than only pod-level checks? Maybe:
- a trivial Python/TypeScript scorer that returns
1.0 - a trace-level scorer that calls
trace.get_spans()/trace.getSpans()
This should exercise the actual scorer sandbox startup path and the trace/object-fetch path that would surface runtime filesystem assumptions under readOnlyRootFilesystem
Adding in the previous security enhancements to fix CEL based policy breaches, previously based on 1.1.32 to the latest helm updates. The security enhancements were:
An example/google-autopilot-cel/ has been created to help the known customers currently needing these CEL enhancements in their production environment.