Skip to content

Lee1258561/sync shutdown executor#50

Closed
lee1258561 wants to merge 4 commits intopinterest/main-2.52.1from
lee1258561/sync_shutdown_executor
Closed

Lee1258561/sync shutdown executor#50
lee1258561 wants to merge 4 commits intopinterest/main-2.52.1from
lee1258561/sync_shutdown_executor

Conversation

@lee1258561
Copy link
Copy Markdown

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

lee1258561 and others added 4 commits March 17, 2026 18:31
Adds optional synchronized shutdown mode for streaming_split where all N
shards must call shutdown_executor before the actual shutdown occurs.
This enables proper coordination in distributed training scenarios.

- Add sync_shutdown and split_idx parameters to shutdown_executor
- Only shard 0 performs actual shutdown; others wait
- Fail if same shard calls shutdown twice before sync completes
- Ignore shutdown calls if called before any epoch has started
- Add unit tests for the new functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The shutdown state variables (_shutdown_received_from, _shutdown_complete,
_shutdown_force_requested) were not reset when a new epoch started, causing
duplicate shutdown errors when reusing streaming split iterators across
multiple epochs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When iterators go out of scope or are deleted, cleanup is automatically
triggered on the coordinator via end_epoch(), eliminating the need for
explicit shutdown_executor calls between epochs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace end_epoch() with shutdown_executor(sync_shutdown=True) for
epoch cleanup. This removes duplicate logic since _barrier already
resets shutdown state when starting a new epoch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale label Apr 2, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@github-actions github-actions Bot closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant