Add: memfd-based SO loading for all runtimes#513
Add: memfd-based SO loading for all runtimes#513puddingfjz wants to merge 7 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a memory file descriptor (memfd) based mechanism for loading orchestration shared objects, aimed at bypassing dynamic linker issues in AICPU environments. The changes include the addition of a memfd_loader.h utility and updates to aicpu_executor.cpp across several runtime paths to implement a 'memfd-first' loading strategy with a file-based fallback. Review feedback highlights opportunities to improve debuggability by adding logging for failure cases in the new loader and ensuring logging consistency across the various versions of the executor modified in this PR.
| if (handle == nullptr) { | ||
| close(fd); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
To improve debuggability, it would be beneficial to add logging for failure cases within this function. When memfd_create, write, symlink, or dlopen fail, logging the error (e.g., from errno or dlerror()) would provide valuable insight before falling back to the file-based method. This would also make the behavior consistent with the fallback path, which does have logging for its failures.
| if (handle == nullptr) { | |
| close(fd); | |
| return -1; | |
| } | |
| if (handle == nullptr) { | |
| const char *dl_err = dlerror(); | |
| DEV_INFO("Thread %d: dlopen from memfd symlink failed: %s, falling back to file-based loading", | |
| orch_thread_num, dl_err ? dl_err : "unknown error"); | |
| close(fd); | |
| return -1; | |
| } |
References
- Always check the return values of resource allocation functions (like pthread_key_create) and handle potential failures gracefully, for example, by rolling back partial allocations.
| snprintf(so_path, sizeof(so_path), "%s/libdevice_orch_%d.so", candidate_dirs[i], getpid()); | ||
| int32_t fd = open(so_path, O_WRONLY | O_CREAT | O_TRUNC, 0755); | ||
| if (fd < 0) { | ||
| continue; | ||
| } | ||
| ssize_t written = write(fd, so_data, so_size); | ||
| close(fd); | ||
|
|
||
| if (written != static_cast<ssize_t>(so_size)) { | ||
| unlink(so_path); | ||
| continue; | ||
| } | ||
| file_created = true; |
There was a problem hiding this comment.
The logging for file creation and writing has been removed from this loop, which is inconsistent with the other aicpu_executor.cpp files in this PR. Please restore the DEV_INFO logs to help with debugging the file-based fallback path.
snprintf(so_path, sizeof(so_path), "%s/libdevice_orch_%d.so", candidate_dirs[i], getpid());
int32_t fd = open(so_path, O_WRONLY | O_CREAT | O_TRUNC, 0755);
if (fd < 0) {
DEV_INFO(
"Thread %d: Cannot create SO at %s (errno=%d), trying next path", thread_idx, so_path, errno
);
continue;
}
ssize_t written = write(fd, so_data, so_size);
close(fd);
if (written != static_cast<ssize_t>(so_size)) {
DEV_INFO(
"Thread %d: Cannot write SO to %s (errno=%d), trying next path", thread_idx, so_path, errno
);
unlink(so_path);
continue;
}
file_created = true;
DEV_INFO("Thread %d: Created SO file at %s (%zu bytes)", thread_idx, so_path, so_size);References
- For files shared across different runtimes, maintain identical code for consistency, even if a more robust implementation is possible.
| if (handle == nullptr) { | ||
| DEV_ERROR("Thread %d: dlopen failed: %s", thread_idx, dlopen_err ? dlopen_err : "unknown"); | ||
| unlink(so_path); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
A log message for a successful dlopen in the fallback path is missing here, while it's present in the other aicpu_executor.cpp files. For consistency and better debugging, please add it back.
if (handle == nullptr) {
DEV_ERROR("Thread %d: dlopen failed: %s", thread_idx, dlopen_err ? dlopen_err : "unknown");
unlink(so_path);
return -1;
}
DEV_INFO("Thread %d: dlopen succeeded, handle=%p", thread_idx, handle);References
- For files shared across different runtimes, maintain identical code for consistency, even if a more robust implementation is possible.
65060b3 to
7586395
Compare
- Add memfd_loader.h for in-memory SO loading using memfd_create - Integrate memfd loading into AICPU executors across all runtimes - Try memfd first, fall back to file-based loading if memfd fails - Eliminates temporary file pollution in /tmp directory - Provides consistent loading performance without filesystem overhead
7586395 to
62b2fb4
Compare
- Move header guard before _GNU_SOURCE define for cpplint compliance - Wrap long lines (> 80 chars) across multiple lines - Use sizeof() instead of hardcoded 256 in snprintf calls - Update #endif comments to match header guard names
- Fix "WARRANTIES OR ANY KIND" -> "WARRANTIES OF ANY KIND" in memfd_loader.h - Apply clang-format to aicpu_executor.cpp files
- memfd_create and MFD_CLOEXEC are Linux-specific APIs - Add #if defined(__linux__) guards around: - memfd_loader.h includes - memfd loading attempts - memfd cleanup calls - Non-Linux platforms fall back to file-based loading
out_so_path is a char* parameter, not an array. sizeof(out_so_path) returns pointer size (8 bytes) not buffer size (256 bytes). Use constant 256 for buffer size.
On non-Linux platforms, memfd variable was declared but unused, causing -Wunused-variable error with -Werror.
- Put function parameters on single line (per clang-format) - Put snprintf arguments on single line (per clang-format)
Summary
Add memfd-based SO loading mechanism for AICPU orchestration SO files across all runtimes.
Changes
memfd_loader.h: In-memory SO loading usingmemfd_create()/tmpdirectoryMFD_CLOEXECFiles Modified
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/memfd_loader.h(new)src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cppsrc/a5/runtime/tensormap_and_ringbuffer/aicpu/memfd_loader.h(new)src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cppsrc/a2a3/runtime/aicpu_build_graph/aicpu/memfd_loader.h(new)src/a2a3/runtime/aicpu_build_graph/aicpu/aicpu_executor.cppTest plan