-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(code_executors): Add GkeCodeExecutor for sandboxed code execution on GKE #1629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hangfei could you assign a reviewer for my PR |
|
@syangx39 what about blocking network access from the sandbox? Seems like we'd want to add that in. Or are you expecting the developer to create a network policy on his own an apply it to the jobs? |
syangx39
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two Action Items left afterwards
- Follow-up PR on input/output files support with _create_pvc().
- regarding this comment "@syangx39 what about blocking network access from the sandbox? Seems like we'd want to add that in. Or are you expecting the developer to create a network policy on his own an apply it to the jobs?"
gVisor's default network sandboxing in GKE specifically blocks access to sensitive host-level endpoints like the GKE metadata server. A pod running in the gVisor sandbox can still make network calls to public services on the internet.
If gke_code_executor blocks network access, the sandboxed code will not be able topip installpackages from the internet. So I'm thinking -
The best way to handle dependencies is ahead of time. A developer would build an image with the required packages already installed via a requirements.txt file. They would then push this image to Artifact Registry and configure to use their custom image instead of the defaultpython:3.11.
Anyway, I will add a block_network_access flag for the flexibility and default toTrue. But for readbility, will do it in a follow-up PR.
|
What's the startup time for GkeCodeExecutor? Should the user starts it first or it can be triggered on-demand? |
hangfei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create a issue and doc to update docs ad adk-docs repo.
| each execution request. The user's code is mounted via a ConfigMap, and the | ||
| Pod is hardened with a strict security context and resource limits. | ||
| Key Features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add this to source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this 'Key Features' serve as a high-level summary so a future developer can immediately grasp the component's security posture and design without having to read the whole implementation.
Do you think we could keep it for that reason? I feel it adds a lot of value for future maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point.
| file at: contributing/samples/gke_agent_sandbox/deployment_rbac.yaml | ||
| """ | ||
| namespace: str = "default" | ||
| image: str = "python:3.11-slim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it only work with 3.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary 3.11, but it has to be Python https://github.com/syangx39/adk-python/blob/5533fbf31085a98184b91bda48b36a515524ea84/src/google/adk/code_executors/gke_code_executor.py#L110
I can patch a follow-up PR to make the command configurable, turning it into a more generic script executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please. thanks.
| _batch_v1: client.BatchV1Api | ||
| _core_v1: client.CoreV1Api | ||
|
|
||
| def __init__(self, **data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work for AI Studio api key or EasyGCP?
If not, is it possible to throw an exception when it inits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI Studio API key is irrelevant for this component. The GkeCodeExecutor's only job is to communicate with the Kubernetes API, either via ServiceAccount token or kubeconfig fail.
If load_kube_config() fails, it raises a kubernetes.config.ConfigException. This exception will halt the initialization and cause the GkeCodeExecutor constructor to fail, which I think is the correct behavior when no valid Kubernetes credentials can be found.
|
How long does it take to spin up the code executor and finish it? |
|
Please add unit tests. Please add a sample under contributing/samples |
|
Please make sure you add a corresponding docs in adk-docs before submission. Thanks. |
|
|
Overall LGTM. Thanks! Let's do the following before merge:
|
|
|
@hangfei |
|
thanks.
|
GkeCodeExecutor Validation PlanStep 1: Set Up the GKE Autopilot ClusterStep 2: Deploy the Agent and Send Predictions
|
|
LGTM |
…eaking Merge #1629 close #2170 ### Summary This PR introduces `GkeCodeExecutor`, a new code executor that provides a secure and scalable method for running LLM-generated code by leveraging GKE Sandbox. It serves as a robust alternative to local or standard containerized executors by leveraging the **GKE Sandbox** environment, which uses gVisor for workload isolation. For each code execution request, it dynamically creates an ephemeral Kubernetes Job with a hardened Pod configuration, offering significant security benefits and ensuring that each code execution runs in a clean, isolated environment. ### Key Features of GkeCodeExecutor * **Dynamic Job Creation**: Uses the Kubernetes `batch/v1` API to create a new Job for each code snippet. * **Secure Code Mounting**: Injects code into the Pod via a temporary `ConfigMap`, which is mounted to a read-only file. * **gVisor Sandboxing**: Enforces execution within a `gvisor` runtime for kernel-level isolation. * **Hardened Security Context**: Pods run as non-root with all Linux capabilities dropped and a read-only root filesystem. * **Resource Management**: Applies configurable CPU and memory limits to prevent abuse. * **Automatic Cleanup**: Uses the `ttl_seconds_after_finished` feature on Jobs for robust, automatic garbage collection of completed Pods and Jobs. * **Node Scheduling**: The executor uses Kubernetes `tolerations` in its Pod specification. This allows the k8s scheduler to place the execution Pod onto a **_pre-configured_** gVisor-enabled node. * **Module Integration**: The `GkeCodeExecutor` is registered in the `code_executors/__init__.py`, making it available for use by agents. The `ImportError` handling is configured to check for the required `kubernetes` SDK. ### Execution Flow: 1. Agent invokes `GkeCodeExecutor` with the LLM-generated code. 2. The `GkeCodeExecutor` will `execute_code` – creates a temporary `ConfigMap`, and then create a k8s `Job` to run it. 3. This Job runs a standard `python:3.11-slim` container. The image is pulled once to the node and cached. The Job will mount the ConfigMap as `/app/code.py` 4. The GkeCodeExecutor will monitor the Job to completion, fetch `stdout/stderr` logs from the container, return `CodeExecutionResult` to the LlmAgent, and ensure all temp resources are deleted. 5. The calling agent formats the result and provides a final response to the user. If the result contains error, it will retry up to `error_retry_attempts` times. PiperOrigin-RevId: 804511467
|
Merged! |
close #2170
Summary
This PR introduces
GkeCodeExecutor, a new code executor that provides a secure and scalable method for running LLM-generated code by leveraging GKE Sandbox. It serves as a robust alternative to local or standard containerized executors by leveraging the GKE Sandbox environment, which uses gVisor for workload isolation.For each code execution request, it dynamically creates an ephemeral Kubernetes Job with a hardened Pod configuration, offering significant security benefits and ensuring that each code execution runs in a clean, isolated environment.
Key Features of GkeCodeExecutor
batch/v1API to create a new Job for each code snippet.ConfigMap, which is mounted to a read-only file.gvisorruntime for kernel-level isolation.ttl_seconds_after_finishedfeature on Jobs for robust, automatic garbage collection of completed Pods and Jobs.tolerationsin its Pod specification. This allows the k8s scheduler to place the execution Pod onto a pre-configured gVisor-enabled node.GkeCodeExecutoris registered in thecode_executors/__init__.py, making it available for use by agents. TheImportErrorhandling is configured to check for the requiredkubernetesSDK.Execution Flow:
GkeCodeExecutorwith the LLM-generated code.GkeCodeExecutorwillexecute_code– creates a temporaryConfigMap, and then create a k8sJobto run it.python:3.11-slimcontainer. The image is pulled once to the node and cached. The Job will mount the ConfigMap as/app/code.pystdout/stderrlogs from the container, returnCodeExecutionResultto the LlmAgent, and ensure all temp resources are deleted.