Skip to content

[FEA] Multi-node Out of Core Streaming KMeans API#2066

Draft
tarang-jain wants to merge 106 commits into
rapidsai:mainfrom
tarang-jain:mnmg-streaming
Draft

[FEA] Multi-node Out of Core Streaming KMeans API#2066
tarang-jain wants to merge 106 commits into
rapidsai:mainfrom
tarang-jain:mnmg-streaming

Conversation

@tarang-jain
Copy link
Copy Markdown
Contributor

Merge after #2015 and #2017

Allows a stream of input matrices per worker, that are further batched using the streaming_batch_size parameter. Reasoning: We should be able to supply dask partitions (on host) directly without having to concatenate them into one consolidated matrix.

@tarang-jain tarang-jain requested review from a team as code owners May 7, 2026 22:04
@tarang-jain tarang-jain marked this pull request as draft May 7, 2026 22:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Caution

Review failed

The head commit changed during the review from 6c08a7b to acbcd5a.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants