-
Notifications
You must be signed in to change notification settings - Fork 11
feat: atomic accessors for write flag and cnt0 counter #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: framework-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -96,7 +96,7 @@ int main() | |
|
|
||
| printf("ImCreate_test_gpuipc is waiting update\n"); | ||
| ImageStreamIO_semwait(&imarray[0], 0); | ||
| while(imarray[0].md[0].write ); | ||
| while(SHMIM_WRITE_LOAD(&imarray[0].md[0])); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The syntax |
||
|
|
||
| printf("ImCreate_test_gpuipc reads in SHM\n"); | ||
| cudaMemcpy(h_ptr, d_ptr, imsize[0]*imsize[1]*sizeof(float), | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1592,7 +1592,7 @@ __attribute__((cold)) errno_t ImageStreamIO_createIm_gpu( | |||||||||||||
| clock_gettime(CLOCK_ISIO, &image->md->lastaccesstime); | ||||||||||||||
| clock_gettime(CLOCK_ISIO, &image->md->creationtime); | ||||||||||||||
|
|
||||||||||||||
| image->md->write = 0; | ||||||||||||||
| SHMIM_WRITE_RELEASE(image->md); | ||||||||||||||
| image->md->cnt0 = 0; | ||||||||||||||
| image->md->cnt1 = 0; | ||||||||||||||
|
Comment on lines
+1595
to
1597
|
||||||||||||||
| SHMIM_WRITE_RELEASE(image->md); | |
| image->md->cnt0 = 0; | |
| image->md->cnt1 = 0; | |
| __atomic_store_n(&image->md->cnt0, 0, __ATOMIC_RELAXED); | |
| __atomic_store_n(&image->md->cnt1, 0, __ATOMIC_RELAXED); | |
| SHMIM_WRITE_RELEASE(image->md); |
Copilot
AI
Apr 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This busy-wait loop performs an acquire load on every iteration with an empty body. On weak-memory architectures this can be significantly more expensive than necessary. Consider spinning with a relaxed load (optionally with a CPU pause/yield/backoff), then performing a single acquire fence/load when the flag transitions to 0.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,11 @@ | ||
| #undef PROJECT_NAME | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Those should NOT be undefined, they're critical to how IIRC pkginfo works, and at least to how version information is distributed within the ISIO package. These throw warning during compilation because we're using it wrong, since we use this file to broadcast IMAGESTRUCT_VERSION, which is a mistake. IMAGESTRUCT_VERSION should be part of the public API of the project. |
||
| #define PROJECT_NAME "@PROJECT_NAME@" | ||
| #undef VERSION_MAJOR | ||
| #define VERSION_MAJOR @VERSION_MAJOR@ | ||
| #undef VERSION_MINOR | ||
| #define VERSION_MINOR @VERSION_MINOR@ | ||
| #undef VERSION_PATCH | ||
| #define VERSION_PATCH @VERSION_PATCH@ | ||
| #undef VERSION_OPTION | ||
| #define VERSION_OPTION "@VERSION_OPTION@" | ||
| #define IMAGESTRUCT_VERSION "@VERSION_MAJOR@.@VERSION_MINOR@" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The entire addition is pretty verbose in a file that's pretty big. Can we start another header file with all the weird optimizations separated?
|
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -345,7 +345,18 @@ typedef struct | |||||
| uint64_t cnt1; /**< in 3D rolling buffer image, this is the last slice written */ | ||||||
| uint64_t cnt2; /**< in cnt2-based syncronization, proceed until cnt0=cnt2 */ | ||||||
|
|
||||||
| uint8_t write; /**< 1 if image is being written */ | ||||||
| /** | ||||||
| * Cross-process write mutex flag. | ||||||
| * | ||||||
| * Writers set to 1 before copying data, then to 0 | ||||||
| * after. Readers spin on it to avoid partial frames. | ||||||
| * | ||||||
| * volatile prevents the compiler from caching the | ||||||
| * value in a register (critical for spin loops). | ||||||
| * Use SHMIM_WRITE_* macros for proper CPU memory | ||||||
| * fences via atomic builtins. | ||||||
| */ | ||||||
| volatile uint8_t write; | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inline comment is enough info? |
||||||
|
|
||||||
|
|
||||||
| uint16_t NBkw; /**< number of keywords (max: 65536) */ | ||||||
|
|
@@ -563,6 +574,83 @@ typedef struct /**< structure used to store data arrays */ | |||||
| } IMAGE; | ||||||
|
|
||||||
|
|
||||||
| /* | ||||||
| * ========================================================= | ||||||
| * Atomic accessors for IMAGE_METADATA fields | ||||||
| * ========================================================= | ||||||
| * | ||||||
| * These macros provide proper CPU memory fences for | ||||||
| * cross-process synchronization of shared memory fields. | ||||||
| * | ||||||
| * On x86-64 (TSO), release/acquire compile to plain | ||||||
| * mov instructions plus a compiler barrier — zero | ||||||
| * runtime overhead. On ARM/other weak-memory | ||||||
| * architectures, appropriate dmb fences are emitted. | ||||||
| * | ||||||
| * Usage: | ||||||
| * SHMIM_WRITE_ACQUIRE(md) — set write=1 | ||||||
| * SHMIM_WRITE_RELEASE(md) — set write=0 | ||||||
| * SHMIM_WRITE_LOAD(md) — read write flag | ||||||
| * SHMIM_CNT0_INCREMENT(md) — atomically cnt0++ | ||||||
| * SHMIM_CNT0_LOAD(md) — read cnt0 | ||||||
| */ | ||||||
|
|
||||||
| #if defined(__STDC_VERSION__) \ | ||||||
| && __STDC_VERSION__ >= 201112L \ | ||||||
| && !defined(__STDC_NO_ATOMICS__) | ||||||
| #include <stdatomic.h> | ||||||
|
|
||||||
| #define SHMIM_WRITE_ACQUIRE(md) \ | ||||||
| atomic_store_explicit( \ | ||||||
| (_Atomic uint8_t *)&(md)->write, \ | ||||||
| 1, memory_order_release) | ||||||
|
Comment on lines
+598
to
+606
|
||||||
|
|
||||||
| #define SHMIM_WRITE_RELEASE(md) \ | ||||||
| atomic_store_explicit( \ | ||||||
| (_Atomic uint8_t *)&(md)->write, \ | ||||||
| 0, memory_order_release) | ||||||
|
|
||||||
| #define SHMIM_WRITE_LOAD(md) \ | ||||||
| atomic_load_explicit( \ | ||||||
| (_Atomic uint8_t *)&(md)->write, \ | ||||||
| memory_order_acquire) | ||||||
|
|
||||||
| #define SHMIM_CNT0_INCREMENT(md) \ | ||||||
| atomic_fetch_add_explicit( \ | ||||||
| (_Atomic uint64_t *)&(md)->cnt0, \ | ||||||
| 1, memory_order_release) | ||||||
|
|
||||||
| #define SHMIM_CNT0_LOAD(md) \ | ||||||
| atomic_load_explicit( \ | ||||||
| (_Atomic uint64_t *)&(md)->cnt0, \ | ||||||
| memory_order_acquire) | ||||||
|
|
||||||
| #else | ||||||
| /* GCC/Clang __atomic builtins fallback */ | ||||||
|
|
||||||
| #define SHMIM_WRITE_ACQUIRE(md) \ | ||||||
| __atomic_store_n( \ | ||||||
| &(md)->write, 1, __ATOMIC_RELEASE) | ||||||
|
|
||||||
| #define SHMIM_WRITE_RELEASE(md) \ | ||||||
| __atomic_store_n( \ | ||||||
| &(md)->write, 0, __ATOMIC_RELEASE) | ||||||
|
|
||||||
| #define SHMIM_WRITE_LOAD(md) \ | ||||||
| __atomic_load_n( \ | ||||||
| &(md)->write, __ATOMIC_ACQUIRE) | ||||||
|
|
||||||
| #define SHMIM_CNT0_INCREMENT(md) \ | ||||||
| __atomic_add_fetch( \ | ||||||
|
||||||
| __atomic_add_fetch( \ | |
| __atomic_fetch_add( \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this API and find which model is right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here
ImageStreamIO_sempost()andSHMIM_WRITE_RELEASE()happen beforeSHMIM_CNT0_INCREMENT(). If any reader usescnt0to confirm/sequence updates after being woken, it can observe a new frame with the previouscnt0value. Consider updatingcnt0(and any other metadata) before posting semaphores, and clearingwriteas the last step to publish a fully consistent update.