Skip to content

Forkrun testing async io 7#73

Open
jkool702 wants to merge 54 commits intoforkrun_testing_async-io_6from
forkrun_testing_async-io_7
Open

Forkrun testing async io 7#73
jkool702 wants to merge 54 commits intoforkrun_testing_async-io_6from
forkrun_testing_async-io_7

Conversation

@jkool702
Copy link
Owner

@jkool702 jkool702 commented Mar 5, 2026

@sourcery-ai summary

jkool702 and others added 30 commits February 28, 2026 13:53
Refactor read and write operations to handle errors more robustly and improve the handling of data availability in the buffer.
Added step to upload generated frun.bash as an artifact.
Added NUMA node discovery and topology mapping functionality.
Refactor NUMA node handling and improve error reporting.
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @jkool702, your pull request is larger than the review limit of 150000 diff characters

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the frun utility by introducing advanced NUMA awareness, improving the robustness of asynchronous I/O operations, and streamlining the management of loadable components. The changes aim to boost performance, stability, and maintainability, particularly in complex multi-processor environments, by optimizing resource utilization and error handling.

Highlights

  • NUMA Node Discovery and Worker Management: Introduced a new function _forkrun_build_numa_map to automatically detect and map NUMA nodes, allowing for more intelligent worker distribution and resource allocation across physical processors. The worker spawning logic was significantly reworked to support directed spawns to specific NUMA nodes and track workers per node.
  • Robust Asynchronous I/O Operations: Enhanced the reliability of I/O operations by implementing robust retry mechanisms with usleep for EINTR and EAGAIN errors in robust_sendfile, ring_copy_chunk, and splice operations. This prevents premature termination and ensures data integrity during asynchronous transfers.
  • Refactored C Loadable Code: The core C loadable code was refactored, including renaming forkrun_ring.alt.c to ring_loadables/old/forkrun_ring.OLD2.c, simplifying internal comments, and improving the pread loop in ring_indexer_numa_main for better error handling. Conditional use of ring_fallow_phys was added for multi-NUMA node configurations.
  • Dynamic Benchmark Script: The run_benchmark.bash script was updated to use a configurable line count for test files, include an optional f4 file in benchmarks, and dynamically clean up test files, making the benchmarking process more flexible and maintainable.
  • Externalized Base64 Data: Removed embedded base64 data from frun.nob64.bash and introduced a new script update_frun_base64.bash to manage the base64 encoding of loadable binaries, streamlining the build and update process for these components.
Changelog
  • ring_loadables/benchmark.out
    • Removed old benchmark results.
  • ring_loadables/frun.nob64.bash
    • Declared new local variables numa_map_str and parsed_numa_nodes_arg.
    • Updated --nodes argument parsing to store the value in parsed_numa_nodes_arg.
    • Added --return-bytes to ring_init_opts for stdin and byte mode.
    • Implemented _forkrun_build_numa_map function for NUMA topology detection and mapping.
    • Integrated NUMA map into ring_init_opts using --numa-map.
    • Enhanced pipe handling in background processes (ring_numa_ingest, ring_indexer_numa, ring_numa_scanner) by explicitly closing unused file descriptors.
    • Introduced conditional use of ring_fallow_phys for multi-NUMA node configurations.
    • Refined worker loop condition for more robust termination.
    • Reworked the worker spawning logic to support NUMA-aware worker distribution and directed spawns.
    • Removed embedded base64 data.
  • ring_loadables/old/forkrun_ring.OLD2.c
    • Renamed from forkrun_ring.alt.c.
    • Updated usage strings for several ring_ commands.
    • Removed various explanatory comments related to CPU_SETSIZE, allocated_num_nodes, SharedState fields, RING ARRAYS, and BITMASK CONFIGURATION.
    • Improved pread loop in ring_indexer_numa_main for better error handling and partial reads.
    • Removed comment about zero-length tasks in ring_indexer_numa_main.
    • Simplified ADAPTIVE_FLOW_CONTROL macro by removing detailed comments.
    • Removed comment about physically pinning scanner processes.
    • Renamed pipe_stalled to experienced_stall and updated its usage in ring_numa_scanner_main.
    • Removed comments related to starve_meter and early flush conditions in ring_scanner_main.
    • Enhanced ring_scanner_main's data refill and EOF detection logic, incorporating experienced_stall.
    • Added retry logic with usleep for EINTR/EAGAIN errors in robust_sendfile and ring_copy_chunk.
    • Implemented robust inner drain loops for splice operations in ring_copy_main.
    • Removed comment explaining OOM back-pressure helper.
  • ring_loadables/run_benchmark.bash
    • Added fLines variable for configurable test file sizes.
    • Updated test file generation to use fLines.
    • Added conditional inclusion of f4 for benchmarking.
    • Modified CPU utilization output format.
    • Updated file cleanup to use array expansion for flexibility.
  • ring_loadables/update_frun_base64.bash
    • Added new script to update frun.bash with base64 encoded loadable binaries.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/forkrun_release.yml
Activity
  • The pull request was authored by jkool702.
  • The description includes an automated summary request, indicating the use of AI tools for documentation.
  • The changes involve significant refactoring and new feature implementation, suggesting active development on the frun utility.
  • The benchmark.out file was cleared, implying that new performance benchmarks are intended to be run and recorded.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@jkool702
Copy link
Owner Author

jkool702 commented Mar 5, 2026

/gemini summary

@jkool702
Copy link
Owner Author

jkool702 commented Mar 5, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements to forkrun's asynchronous I/O and NUMA handling capabilities. The changes include a sophisticated NUMA topology mapping mechanism in the main bash script, more robust I/O handling in the C loadables with better error checking and retry logic for transient conditions, and improved process management for worker spawning.

My review identified a potential issue in the new worker spawn loop where the global maximum worker limit (nWorkersMax) might not be respected, which could lead to resource over-consumption. I've also suggested a simplification for a developer utility script to improve its readability and maintainability. Overall, these are solid improvements that make the tool more robust and powerful.

Note: Security Review did not run due to the size of the PR.

Comment on lines +497 to 500
for (( ; node_workers[node_idx] < target; node_workers[node_idx]++ )); do
spawn_worker "$nWorkers" "$node_idx"
node_idx=$(( (node_idx + 1) % FORKRUN_NUM_NODES ))
((nWorkers++))
done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The worker spawn loop does not respect the global nWorkersMax limit. The per-node limit node_worker_max is calculated using integer division, which can lead to the total number of spawned workers exceeding nWorkersMax in some cases. For example, if nWorkersMax is 27 and FORKRUN_NUM_NODES is 28, node_worker_max becomes 1, allowing up to 28 workers to be spawned.

To fix this, you should add a check against nWorkersMax in the inner spawn loop to ensure the total number of workers does not exceed the configured maximum.

Suggested change
for (( ; node_workers[node_idx] < target; node_workers[node_idx]++ )); do
spawn_worker "$nWorkers" "$node_idx"
node_idx=$(( (node_idx + 1) % FORKRUN_NUM_NODES ))
((nWorkers++))
done
for (( ; node_workers[node_idx] < target && nWorkers < nWorkersMax; node_workers[node_idx]++ )); do
spawn_worker "$nWorkers" "$node_idx"
((nWorkers++))
done

@@ -0,0 +1,22 @@
. <( { echoFlag=false; while true; do IFS= read -r line; [[ "$line" == *'_forkrun_file_to_base64() {'* ]] && echoFlag=true; $echoFlag && echo "$line"; $echoFlag && [[ "$line" == '}'* ]] && break; done; } <frun.bash )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line to source the _forkrun_file_to_base64 function is very complex and hard to read, which could make it difficult to maintain. A simpler and more readable approach would be to use sed to extract the function definition.

. <(sed -n '/_forkrun_file_to_base64() {/,/}/p' frun.bash)

@gemini-code-assist
Copy link

Summary of Changes

This pull request significantly advances the frun utility by integrating sophisticated NUMA topology awareness and refining its asynchronous I/O mechanisms. These changes aim to optimize resource utilization, enhance the stability of data transfers, and improve overall performance, particularly in complex server architectures. The refactoring also simplifies the codebase and makes benchmarking more adaptable.

Highlights

  • Enhanced NUMA Awareness and Worker Management: Introduced a new function _forkrun_build_numa_map to automatically detect and map NUMA nodes, allowing for more intelligent worker distribution and resource allocation. The worker spawning logic was significantly reworked to support directed spawns to specific NUMA nodes and track workers per node, improving efficiency in multi-processor environments.
  • Improved Asynchronous I/O Robustness: Enhanced the reliability of I/O operations by implementing robust retry mechanisms with usleep for EINTR and EAGAIN errors in robust_sendfile, ring_copy_chunk, and splice operations, preventing premature termination and ensuring data integrity during asynchronous transfers.
  • Refactored C Loadable Code and Adaptive Flow Control: The core C loadable code was refactored, including renaming forkrun_ring.alt.c to ring_loadables/old/forkrun_ring.OLD2.c, simplifying internal comments, and improving the pread loop in ring_indexer_numa_main for better error handling. The ADAPTIVE_FLOW_CONTROL macro was streamlined, and conditional use of ring_fallow_phys was added for multi-NUMA node configurations.
  • Dynamic Benchmarking and Externalized Base64 Data: The run_benchmark.bash script was updated for more flexible test file generation and dynamic cleanup. Additionally, embedded base64 data was removed from frun.nob64.bash and a new script update_frun_base64.bash was introduced to manage the base64 encoding of loadable binaries, streamlining the build process.
Changelog
  • ring_loadables/benchmark.out
    • Removed old benchmark results.
  • ring_loadables/frun.nob64.bash
    • Declared new local variables numa_map_str and parsed_numa_nodes_arg.
    • Updated --nodes argument parsing to store the value in parsed_numa_nodes_arg.
    • Added --return-bytes to ring_init_opts for stdin and byte mode.
    • Implemented _forkrun_build_numa_map function for NUMA topology detection and mapping.
    • Integrated NUMA map into ring_init_opts using --numa-map.
    • Enhanced pipe handling in background processes (ring_numa_ingest, ring_indexer_numa, ring_numa_scanner) by explicitly closing unused file descriptors.
    • Introduced conditional use of ring_fallow_phys for multi-NUMA node configurations.
    • Refined worker loop condition for more robust termination.
    • Reworked the worker spawning logic to support NUMA-aware worker distribution and directed spawns.
    • Removed embedded base64 data.
  • ring_loadables/old/forkrun_ring.OLD2.c
    • Renamed from forkrun_ring.alt.c.
    • Updated usage strings for several ring_ commands.
    • Removed various explanatory comments related to CPU_SETSIZE, allocated_num_nodes, SharedState fields, RING ARRAYS, and BITMASK CONFIGURATION.
    • Improved pread loop in ring_indexer_numa_main for better error handling and partial reads.
    • Removed comment about zero-length tasks in ring_indexer_numa_main.
    • Simplified ADAPTIVE_FLOW_CONTROL macro by removing detailed comments.
    • Removed comment about physically pinning scanner processes.
    • Renamed pipe_stalled to experienced_stall and updated its usage in ring_numa_scanner_main.
    • Removed comments related to starve_meter and early flush conditions in ring_scanner_main.
    • Enhanced ring_scanner_main's data refill and EOF detection logic, incorporating experienced_stall.
    • Added retry logic with usleep for EINTR/EAGAIN errors in robust_sendfile and ring_copy_chunk.
    • Implemented robust inner drain loops for splice operations in ring_copy_main.
    • Removed comment explaining OOM back-pressure helper.
  • ring_loadables/run_benchmark.bash
    • Added fLines variable for configurable test file sizes.
    • Updated test file generation to use fLines.
    • Added conditional inclusion of f4 for benchmarking.
    • Modified CPU utilization output format.
    • Updated file cleanup to use array expansion for flexibility.
  • ring_loadables/update_frun_base64.bash
    • Added new script to update frun.bash with base64 encoded loadable binaries.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/forkrun_release.yml
Activity
  • A summary of changes was automatically generated by gemini-code-assist[bot] upon PR creation.
  • SonarQube Cloud reported a Quality Gate failure due to 65 Security Hotspots and 71.7% duplication on new code.
  • The author, jkool702, explicitly requested a summary and a review from gemini-code-assist[bot].
  • A high-priority review comment from gemini-code-assist[bot] identified an issue in the worker spawn loop where nWorkersMax might be exceeded due to integer division, providing a code suggestion.
  • A medium-priority review comment from gemini-code-assist[bot] highlighted the complexity of sourcing the _forkrun_file_to_base64 function and offered a simpler sed-based alternative.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements for NUMA-aware processing in forkrun, including a sophisticated NUMA topology mapping in the bash script and corresponding logic in the C loadables. It also improves the robustness of I/O operations by handling partial reads/writes and potential race conditions more gracefully. The benchmark scripts have also been updated for better flexibility and accuracy. My review includes a couple of suggestions to improve maintainability in the C code and a utility script.

Note: Security Review did not run due to the size of the PR.

Comment on lines 2243 to +2260
static ssize_t robust_sendfile(int out_fd, int in_fd, off_t *offset, size_t count) {
size_t total = 0;
int retries = 0;
while (total < count) {
ssize_t s = sendfile(out_fd, in_fd, offset, count - total);
if (s < 0) {
if (errno == EINTR || errno == EAGAIN) continue;
if (errno == EINTR || errno == EAGAIN) { usleep(10); continue; }
return total > 0 ? (ssize_t)total : -1;
}
if (s == 0) break; // genuine EOF: file shorter than ack claimed
if (s == 0) {
if (retries++ < 1000) { usleep(100); continue; }
break;
}
retries = 0;
total += s;
}
return (ssize_t)total;
};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function, along with ring_copy_chunk and ring_copy_main, uses magic numbers for retry counts and sleep durations (e.g., 10, 1000, 100). This harms readability and makes tuning these values difficult. It would be better to define these as named constants at the top of the file.

For example:

#define ROBUST_IO_MAX_RETRIES 1000
#define ROBUST_IO_RETRY_SLEEP_US 100
#define ROBUST_IO_EINTR_SLEEP_US 10

Then, these constants can be used throughout the file, improving maintainability.

@@ -0,0 +1,22 @@
. <( { echoFlag=false; while true; do IFS= read -r line; [[ "$line" == *'_forkrun_file_to_base64() {'* ]] && echoFlag=true; $echoFlag && echo "$line"; $echoFlag && [[ "$line" == '}'* ]] && break; done; } <frun.bash )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method of extracting the _forkrun_file_to_base64 function is fragile. It depends on the exact formatting of the function definition in frun.bash, including spacing. Any minor formatting change to that function in the source file could break this script. A more robust approach would be to use a tool like sed or awk for extraction, or to duplicate the function in this utility script if it's not expected to change often.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 7, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
76 Security Hotspots
79.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant