-
Notifications
You must be signed in to change notification settings - Fork 23
feat(sandbox): add redis ttl, reuse logic, and fix mounting issues #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @xuehuitian45, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the lifecycle management of sandboxes by introducing robust mechanisms for Redis-backed sandbox instances, including expiration and reference counting to ensure efficient resource utilization and prevent accidental deletions. It also improves the reliability of sandbox reuse through integrated health checks and addresses a critical issue with file mounting by automatically creating missing host paths. The changes further streamline the API by passing full Sandbox objects to core service methods, allowing for more context-aware operations and better resource management. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to sandbox lifecycle management, including TTL-based expiration with Redis, reference counting for sandbox reuse, and health checks before reuse. The refactoring to pass Sandbox objects instead of just IDs is a good step towards better type safety and code clarity. However, I've identified a few issues that need attention. There is a critical bug in SandboxService.establishConnection that could lead to using stale data after a sandbox is recreated. Additionally, there's a potential resource leak due to improper shutdown of the cleanup task's executor. I've also provided suggestions to improve maintainability by addressing magic numbers and simplifying complex logic.
| ContainerModel containerInfo = getInfo(sandbox); | ||
| if(!checkSandboxStatus(sandbox.getSandboxId())){ | ||
| createContainer(sandbox); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a critical issue in this block. getInfo(sandbox) is called before the sandbox health is checked and potentially recreated. If checkSandboxStatus returns false, createContainer(sandbox) is called, which creates a new container. However, the containerInfo variable will still hold the information of the old, unhealthy container, leading to subsequent operations using stale data.
To fix this, you should check the status and recreate the container before getting the container info. The createContainer call can also throw a checked JsonProcessingException which is not handled here and will be wrapped in a RuntimeException, which is fine, but the stale data issue is critical.
| ContainerModel containerInfo = getInfo(sandbox); | |
| if(!checkSandboxStatus(sandbox.getSandboxId())){ | |
| createContainer(sandbox); | |
| } | |
| if(!checkSandboxStatus(sandbox.getSandboxId())){ | |
| createContainer(sandbox); | |
| } | |
| ContainerModel containerInfo = getInfo(sandbox); |
| if(sandboxService != null){ | ||
| sandboxService.stop(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling sandboxService.stop() will not shut down the scheduled executor service for cleaning up expired sandboxes, which can lead to a resource leak. The close() method in SandboxService is designed for a complete shutdown, including the executor. You should call sandboxService.close() to ensure all resources are properly released.
Additionally, for coding style consistency, please add a space after if.
| if(sandboxService != null){ | |
| sandboxService.stop(); | |
| } | |
| if (sandboxService != null) { | |
| sandboxService.close(); | |
| } |
| public void stop(){ | ||
| cleanupAllSandboxes(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new stop() method is redundant and can cause a resource leak. It only cleans up sandboxes but doesn't shut down the cleanupExecutor thread. The existing close() method already handles both cleanup and executor shutdown. It's recommended to remove this stop() method and have callers (like AgentScopeAgentHandler) use close() for a complete and safe shutdown of the service.
| for (Map.Entry<String, ContainerModel> entry : allSandboxes.entrySet()) { | ||
| String containerId = entry.getKey(); | ||
| long ttl = sandboxMap.getTTL(containerId); | ||
| if (ttl > 0 && ttl < 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (hostPath.endsWith(File.separator) || hostPath.endsWith("/") || | ||
| containerPath.endsWith("/") || containerPath.endsWith(File.separator)) { | ||
| if (hostFile.mkdirs()) { | ||
| logger.info("Successfully created directory: {}", hostPath); | ||
| } else { | ||
| logger.warn("Failed to create directory: {}", hostPath); | ||
| continue; | ||
| } | ||
| } else { | ||
| File parentDir = hostFile.getParentFile(); | ||
| if (parentDir != null && !parentDir.exists()) { | ||
| parentDir.mkdirs(); | ||
| } | ||
| if (hostFile.createNewFile()) { | ||
| logger.info("Successfully created file: {}", hostPath); | ||
| } else { | ||
| logger.warn("Failed to create file: {}", hostPath); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to determine whether to create a file or a directory by checking the suffix of both hostPath and containerPath is quite complex and could be fragile. This could be simplified by, for example, relying only on the containerPath's suffix or by adding an explicit flag to the mount configuration. Using the java.nio.file.Files API could also make the file and directory creation logic more concise and readable.
| public Set<String> scan(String pattern) { | ||
| return Set.copyOf(syncCommands.keys(pattern)); | ||
| Set<String> keys = new HashSet<>(); | ||
| ScanArgs scanArgs = ScanArgs.Builder.limit(100).match(pattern); | ||
| KeyScanCursor<String> cursor = syncCommands.scan(scanArgs); | ||
|
|
||
| while (cursor != null) { | ||
| keys.addAll(cursor.getKeys()); | ||
| if (cursor.isFinished()) { | ||
| break; | ||
| } | ||
| cursor = syncCommands.scan(cursor, scanArgs); | ||
| } | ||
| return keys; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of the scan method is correct, but the while (cursor != null) loop is a bit unconventional since syncCommands.scan() doesn't return null. This could be confusing. A more idiomatic and readable approach would be to use a do-while loop or check !cursor.isFinished() in the loop condition.
public Set<String> scan(String pattern) {
Set<String> keys = new HashSet<>();
ScanArgs scanArgs = ScanArgs.Builder.limit(100).match(pattern);
KeyScanCursor<String> cursor = syncCommands.scan(scanArgs);
keys.addAll(cursor.getKeys());
while (!cursor.isFinished()) {
cursor = syncCommands.scan(cursor, scanArgs);
keys.addAll(cursor.getKeys());
}
return keys;
}|
|
||
| private void refreshExpiration(String... keys) { | ||
| if (expirationSeconds > 0) { | ||
| long ttl = (long) expirationSeconds + 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR improves the sandbox lifecycle management: