Fix/pinned portable flags by felipeblazing · Pull Request #115 · NVIDIA/cuCascade

felipeblazing · 2026-04-28T20:09:23Z

No description provided.

…pool peer access Squash of 5 fixes onto cucascade 73d00c4 for Sirius multi-GPU v1.4. Original commits: 1fff85d 3743621 2dcab24 ff14ff4 e23f3a2 - Pin host memory with cudaHostAllocPortable for multi-GPU DMA accessibility - Make cudaMallocHost sites Portable-aware - Add cudaHostAllocMapped to all pinned allocation sites - Per-instance ptds_allocation_tracker (thread_local, not process-global) - Drop pool priming (prevents multi-space GPU memory exhaustion) - Add cross-device pool peer access at construction time Co-Authored-By: Felipe Aramburu <faramburu@nvidia.com>

Squash of 1 fix onto cucascade 73d00c4 for Sirius multi-GPU v1.4. Original commit: eda349a std::thread _thread must be declared AFTER _mutex and _cv. C++ destroys members in reverse-declaration order; placing _thread last ensures the join() inside ~io_worker happens while _mutex and _cv are still alive. Avoids EINVAL on mutex destruction during io_worker teardown under parallel test runs. Co-Authored-By: Felipe Aramburu <faramburu@nvidia.com>

…A probe at init Squash of 3 fixes onto cucascade 73d00c4 for Sirius multi-GPU v1.4. Original commits: 7ed84f2 cc2a53d e4db3d8 - Use target-bound stream in host->gpu and gpu->gpu converters (v1.1 P2P fix) - Pass source mr to cudf::pack + default-pool peer access (WIP carry) - Empirical P2P peer DMA probe at init; route convert_gpu_to_gpu to real peer DMA on server hardware and host-staging on consumer chipsets Co-Authored-By: Felipe Aramburu <faramburu@nvidia.com>

…tation + cudaStreamWaitEvent Squash of 2 fixes onto cucascade 73d00c4 for Sirius multi-GPU v1.4. Original commits: 7409c60 62e0517 Phase 13 fix: closes SF100 Q11 2-GPU illegal-address race. - Add record_writer_event/get_writer_event accessors on gpu_table_representation - require writer_stream as a REQUIRED ctor argument (compile-time enforced) - convert_gpu_to_gpu: cudaStreamWaitEvent(target_stream, src.get_writer_event()) before peer copy; fallback to cudaDeviceSynchronize for un-migrated callers - Replaces cudf::pack path with column-tree walk (avoids stream-ordered race in compute_splits scratch allocations) Co-Authored-By: Felipe Aramburu <faramburu@nvidia.com>

Pure formatting and typo fixes from `pre-commit run --all-files`: - 15 source/header files reformatted by clang-format (line-wrap, brace alignment) — no semantic changes - docs/ARCHITECTURE.md: 'sytem' → 'system' (codespell) No semantic changes; runtime behavior identical to v1.4 ship state (cucascade pin 1c1e648). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

felipeblazing force-pushed the fix/pinned-portable-flags branch from b753c0f to 62e0517 Compare May 1, 2026 14:56

felipeblazing added breaking feature request New feature or request labels May 4, 2026

felipeblazing and others added 5 commits May 4, 2026 18:18

felipeblazing force-pushed the fix/pinned-portable-flags branch from 62e0517 to 42a01c4 Compare May 6, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/pinned portable flags#115

Fix/pinned portable flags#115
felipeblazing wants to merge 5 commits intoNVIDIA:mainfrom
felipeblazing:fix/pinned-portable-flags

felipeblazing commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

felipeblazing commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant