Skip to content

Add: memfd-based SO loading for all runtimes#513

Open
puddingfjz wants to merge 7 commits intohw-native-sys:mainfrom
puddingfjz:memfd-so-loading
Open

Add: memfd-based SO loading for all runtimes#513
puddingfjz wants to merge 7 commits intohw-native-sys:mainfrom
puddingfjz:memfd-so-loading

Conversation

@puddingfjz
Copy link
Copy Markdown

Summary

Add memfd-based SO loading mechanism for AICPU orchestration SO files across all runtimes.

Changes

  • Add memfd_loader.h: In-memory SO loading using memfd_create()
  • Update AICPU executors to use memfd loading instead of file-based loading
  • Eliminates temporary file pollution in /tmp directory
  • Kernel-managed cleanup via MFD_CLOEXEC

Files Modified

  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/memfd_loader.h (new)
  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/aicpu/memfd_loader.h (new)
  • src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/aicpu_build_graph/aicpu/memfd_loader.h (new)
  • src/a2a3/runtime/aicpu_build_graph/aicpu/aicpu_executor.cpp

Test plan

  • Tested on a2a3 hardware with tensormap_and_ringbuffer and aicpu_build_graph runtime
  • Performance profiled: ~5% faster core loading for tensormap_and_ringbuffer and 7.55% faster for aicpu_build_graph.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a memory file descriptor (memfd) based mechanism for loading orchestration shared objects, aimed at bypassing dynamic linker issues in AICPU environments. The changes include the addition of a memfd_loader.h utility and updates to aicpu_executor.cpp across several runtime paths to implement a 'memfd-first' loading strategy with a file-based fallback. Review feedback highlights opportunities to improve debuggability by adding logging for failure cases in the new loader and ensuring logging consistency across the various versions of the executor modified in this PR.

Comment on lines +103 to +106
if (handle == nullptr) {
close(fd);
return -1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve debuggability, it would be beneficial to add logging for failure cases within this function. When memfd_create, write, symlink, or dlopen fail, logging the error (e.g., from errno or dlerror()) would provide valuable insight before falling back to the file-based method. This would also make the behavior consistent with the fallback path, which does have logging for its failures.

Suggested change
if (handle == nullptr) {
close(fd);
return -1;
}
if (handle == nullptr) {
const char *dl_err = dlerror();
DEV_INFO("Thread %d: dlopen from memfd symlink failed: %s, falling back to file-based loading",
orch_thread_num, dl_err ? dl_err : "unknown error");
close(fd);
return -1;
}
References
  1. Always check the return values of resource allocation functions (like pthread_key_create) and handle potential failures gracefully, for example, by rolling back partial allocations.

Comment on lines +1979 to +1991
snprintf(so_path, sizeof(so_path), "%s/libdevice_orch_%d.so", candidate_dirs[i], getpid());
int32_t fd = open(so_path, O_WRONLY | O_CREAT | O_TRUNC, 0755);
if (fd < 0) {
continue;
}
ssize_t written = write(fd, so_data, so_size);
close(fd);

if (written != static_cast<ssize_t>(so_size)) {
unlink(so_path);
continue;
}
file_created = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logging for file creation and writing has been removed from this loop, which is inconsistent with the other aicpu_executor.cpp files in this PR. Please restore the DEV_INFO logs to help with debugging the file-based fallback path.

                        snprintf(so_path, sizeof(so_path), "%s/libdevice_orch_%d.so", candidate_dirs[i], getpid());
                        int32_t fd = open(so_path, O_WRONLY | O_CREAT | O_TRUNC, 0755);
                        if (fd < 0) {
                            DEV_INFO(
                                "Thread %d: Cannot create SO at %s (errno=%d), trying next path", thread_idx, so_path, errno
                            );
                            continue;
                        }
                        ssize_t written = write(fd, so_data, so_size);
                        close(fd);

                        if (written != static_cast<ssize_t>(so_size)) {
                            DEV_INFO(
                                "Thread %d: Cannot write SO to %s (errno=%d), trying next path", thread_idx, so_path, errno
                            );
                            unlink(so_path);
                            continue;
                        }
                        file_created = true;
                        DEV_INFO("Thread %d: Created SO file at %s (%zu bytes)", thread_idx, so_path, so_size);
References
  1. For files shared across different runtimes, maintain identical code for consistency, even if a more robust implementation is possible.

Comment on lines +2003 to +2007
if (handle == nullptr) {
DEV_ERROR("Thread %d: dlopen failed: %s", thread_idx, dlopen_err ? dlopen_err : "unknown");
unlink(so_path);
return -1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A log message for a successful dlopen in the fallback path is missing here, while it's present in the other aicpu_executor.cpp files. For consistency and better debugging, please add it back.

                    if (handle == nullptr) {
                        DEV_ERROR("Thread %d: dlopen failed: %s", thread_idx, dlopen_err ? dlopen_err : "unknown");
                        unlink(so_path);
                        return -1;
                    }
                    DEV_INFO("Thread %d: dlopen succeeded, handle=%p", thread_idx, handle);
References
  1. For files shared across different runtimes, maintain identical code for consistency, even if a more robust implementation is possible.

- Add memfd_loader.h for in-memory SO loading using memfd_create
- Integrate memfd loading into AICPU executors across all runtimes
- Try memfd first, fall back to file-based loading if memfd fails
- Eliminates temporary file pollution in /tmp directory
- Provides consistent loading performance without filesystem overhead
- Move header guard before _GNU_SOURCE define for cpplint compliance
- Wrap long lines (> 80 chars) across multiple lines
- Use sizeof() instead of hardcoded 256 in snprintf calls
- Update #endif comments to match header guard names
- Fix "WARRANTIES OR ANY KIND" -> "WARRANTIES OF ANY KIND" in memfd_loader.h
- Apply clang-format to aicpu_executor.cpp files
- memfd_create and MFD_CLOEXEC are Linux-specific APIs
- Add #if defined(__linux__) guards around:
  - memfd_loader.h includes
  - memfd loading attempts
  - memfd cleanup calls
- Non-Linux platforms fall back to file-based loading
out_so_path is a char* parameter, not an array. sizeof(out_so_path)
returns pointer size (8 bytes) not buffer size (256 bytes).
Use constant 256 for buffer size.
On non-Linux platforms, memfd variable was declared but unused,
causing -Wunused-variable error with -Werror.
- Put function parameters on single line (per clang-format)
- Put snprintf arguments on single line (per clang-format)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant