Unconditionally return 1 for ParallelType::Stream for unshardedSizes #5706

zasdfgbnm · 2025-12-18T00:50:09Z

No description provided.

zasdfgbnm · 2025-12-18T00:50:22Z

!test

github-actions · 2025-12-18T00:50:58Z

Description

Simplified ParallelType::Stream handling in unshardedSizes function
Removed conditional logic for stream parallelization dimension checking
Eliminated error checking for non-constant stream extent
Unconditionally returns 1 for all stream parallelized dimensions

Changes walkthrough

Relevant files

Bug fix

execution_utils.cpp `Simplify stream parallelization handling in unshardedSizes` csrc/multidevice/execution_utils.cpp Removed TODO comment about MultiDeviceExecutor hack for stream parallelization Eliminated conditional logic checking if sharded_id is in logical domain Removed error checking for non-constant stream extent evaluation Simplified to unconditionally return 1 for ParallelType::Stream	+1/-23

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review
Removed Error Checking The PR removes important error checking that validated DIDs/Stream extent is constant. This error checking was ensuring that non-constant extents would be caught early rather than potentially causing runtime issues. The removal of this validation could lead to silent failures or incorrect behavior when non-constant extents are encountered. return 1; Behavioral Change Risk The change removes the conditional logic that only returned 1 for logical domains while evaluating extent for other domains. The new unconditional return of 1 for all ParallelType::Stream cases could be incorrect for non-logical domains and may break existing functionality that depends on the extent evaluation. if (parallel_type == ParallelType::Stream) { return 1; } Missing Documentation The PR removes a detailed TODO comment that explained the reasoning behind the original implementation. This context is valuable for understanding why certain decisions were made and what future work is planned. The removal of this documentation makes it harder for future maintainers to understand the design rationale. if (parallel_type == ParallelType::Stream) { return 1; }

Test failures

(High, 95) CUDA driver/runtime mismatch causing init-time failures in nvFuser matmul & top-k test suites on dlcluster_h100

Test Name	H100	Source
Ampere/MmaTest.SingleTile/Ampere_16_8_16__bfloat	❌	Link
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/2048_1_1_0	❌	Link
BlockSizeAndItemsPerThread/ArgSortComprehensiveTest.ComprehensiveValidation/BlockSize32_ItemsPerThread4	❌	Link
ClusterReductionTest.SimpleFusionNotAllReduce/cluster_15_dtype_double	❌	Link
ClusterReductionTest.SimpleFusionNotAllReduce/cluster_4_dtype_double	❌	Link
CutlassExecutorTest.Nvfp4Matmul_BiasEpilogue	❌	Link
General/HopperPlusMatmulSchedulerTest.FusedMultiplySum/KK_512_256_128_MmaMacro_m64_n128_k16_splitk_2	❌	Link
General/HopperPlusMatmulSchedulerTest.FusedMultiplySum/MK_512_256_128_MmaMacro_m128_n128_k16_tma_store	❌	Link
General/HopperPlusMatmulSchedulerTest.FusedMultiplySumBiasNeg/MN_512_256_128_MmaMacro_m64_n128_k16_tma_store_splitk_2	❌	Link
GreedySchedulerTest.ScanNonLocalOutput	❌	Link
... with 85 more test failures omitted. Check internal logs.

(High, 1) Outdated NVIDIA driver on dlcluster_h100 causing CUDA initialization failure in RNG tests

Test Name H100 Source

RNGTest.BroadcastingRNG ❌ Link

(Medium, 12) nvFuser internal assert on non-divisible split (tensor_metadata.cpp) in test_stream.test_two_matmuls_inlinable and multidevice.test_overlap suites

Test Name	A100	A100 (dist.)	GB200	GB200 (dist.)	H100	H100 (dist.)
tests.python.direct.test_stream.test_two_matmuls_inlinable[nvfuser_direct_test=eager]	❌		❌		❌
tests.python.direct.test_stream.test_two_matmuls_inlinable[nvfuser_direct_test=lru_cache]	❌		❌		❌
tests.python.multidevice.test_overlap.test_row_parallel_linear_forward	❌	❌	❌	❌	❌	❌

(Low, 1) Small numerical mismatch in nvFuser reduction tutorial test (tests.python.direct.test_tutorial)

Test Name GB200 Source

tests.python.direct.test_tutorial.test_tutorial_reduction ❌

Unconditionally return 1 for ParallelType::Stream for unshardedSizes

2aa595c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unconditionally return 1 for ParallelType::Stream for unshardedSizes #5706

Unconditionally return 1 for ParallelType::Stream for unshardedSizes #5706

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unconditionally return 1 for ParallelType::Stream for unshardedSizes #5706

Are you sure you want to change the base?

Unconditionally return 1 for ParallelType::Stream for unshardedSizes #5706

Uh oh!

Conversation

zasdfgbnm commented Dec 18, 2025

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 18, 2025 •

edited by xwang233

Loading