UCC/CORE: Added local rank from topo if not provided by user#1245
UCC/CORE: Added local rank from topo if not provided by user#1245MaayanGadishNvidia wants to merge 1 commit intoopenucx:masterfrom
Conversation
|
| Filename | Overview |
|---|---|
| src/components/topo/ucc_topo.h | Adds ucc_topo_node_local_rank() helper; returns UCC_RANK_INVALID when node sbgp is not enabled, which propagates unchecked to UCX in single-rank-per-node multi-rank OOB scenarios |
| src/core/ucc_context.c | Adds ucc_core_ctx_id_exchange() for a prefix-only allgather (ctx_id + host_info); early exchange cleanup, double topo-init guard, and error-path topo/storage cleanup are all handled correctly |
| src/core/ucc_context.h | Declares ucc_core_ctx_id_exchange; updates NODE_LOCAL_ID doc string to document the new three-allgather contract required of OOB backends |
Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.
Reviews (54): Last reviewed commit: "UCC/CORE: Added automation to loading lo..." | Re-trigger Greptile
211c631 to
51f149d
Compare
| ucp_params.estimated_num_eps = params->estimated_num_eps; | ||
| } | ||
|
|
||
| #ifdef HAVE_UCX_NODE_LOCAL_ID |
There was a problem hiding this comment.
Trailing whitespace added.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
a3f3680 to
36a073b
Compare
Additional Comments (1)
|
44108af to
a32583c
Compare
a32583c to
c4f6048
Compare
c4f6048 to
a6456d8
Compare
|
/build |
ee02a90 to
692e49e
Compare
0cf8d6e to
e46bf4a
Compare
|
/build |
5b38353 to
bec53ca
Compare
|
/build |
| ucc_sbgp_t *sbgp; | ||
|
|
||
| if (!topo) { | ||
| return 0; |
There was a problem hiding this comment.
i think it makes sense to return rank_invalid in this case
| sbgp = ucc_topo_get_sbgp(topo, UCC_SBGP_NODE); | ||
|
|
||
| if (sbgp->status != UCC_SBGP_ENABLED) { | ||
| return 0; |
| cfg_mod = std::to_string(bootstrap->get_ppn()); | ||
| UCCCHECK_GOTO(ucc_context_config_modify(ctx_config, NULL, | ||
| "ESTIMATED_NUM_PPN", cfg_mod.c_str()), free_ctx_config, st); | ||
| cfg_mod = std::to_string(bootstrap->get_local_rank()); |
There was a problem hiding this comment.
I removed it for validation of my function. Ill return it
| return status; | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
rem extra blank line
| do { | ||
| /* UCC context create is blocking fn, so we can wait here for the | ||
| completion of addr exchange */ | ||
| status = ucc_core_ctx_id_exchange(ctx, &ctx->params.oob, |
There was a problem hiding this comment.
second oob allgather does host info exchange as well, which is not always necessary since we already have topo info
There was a problem hiding this comment.
I currently clean topo after the local id clac. Do you think I should create another addr exchange function which complete the rest of the original addr exchange, in case that the ucc_core_ctx_id_exchange was used?
4a52d46 to
e05830c
Compare
9400adb to
9d12a66
Compare
|
/build |
|
@MaayanGadishNvidia let's rework so that init tim doesn't increase, please run osu_mpi_init tests to make sure we don't impact init |
What
Adding automation for calculate local rank from topo if not provided by user
Why ?
Continue of #1189