Skip to content

Conversation

@stefpi
Copy link

@stefpi stefpi commented Jan 8, 2026

related issue: #2930

proposed changes:

  • add distributed layers into python/nn/layers for auto summarization
  • add tensor parallelism inference example in docs
    • describe purpose and usage of ShardedToAllLinear and AllToShardedLinear layers
    • show simple example of combining layers together and its benefit
    • TP applied simple Llama model inference script (#1403 on mlx-examples)
  • extracted data parallel example and moved it to dedicated doc

@stefpi
Copy link
Author

stefpi commented Jan 11, 2026

@awni where would be the ideal place for layers.distributed.shard_linear and layers.distributed.shard_inplace functions to be indexed in the docs? I placed the distributed layers under nn/layers, but I can’t find a natural place for the shard functions.

@awni
Copy link
Member

awni commented Jan 11, 2026

I think we should make a new subsection under "Neural Networks", call it say "Distributed" and add all the helper routines and maybe also distributed layers there.

Comment on lines +42 to +54
## Doc Development Setup

To enable live refresh of docs while writing:

Install sphinx autobuild
```
pip install sphinx-autobuild
```

Run auto build on docs/src folder
```
sphinx-autobuild ./src ./build/html
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty cool.

@stefpi
Copy link
Author

stefpi commented Jan 12, 2026

For a data parallel training example, should I pull the existing example from usage/distributed doc into an example/data parallelism doc and expand on it? Or should I leave that and make something new?

@awni
Copy link
Member

awni commented Jan 13, 2026

We might not need an additional example given we already have something in the usage section. Wdyt?

@stefpi
Copy link
Author

stefpi commented Jan 13, 2026

I think the existing example is good. Maybe we could put the existing example in a dedicated doc under examples section and then reference both DP and TP examples in the Distributed Communication doc? That way the DP and TP examples are next to each other and it is easier to expand on it in the future without cluttering the usage doc.

@stefpi
Copy link
Author

stefpi commented Jan 15, 2026

I have extracted the data parallel training example into examples/data parallelism and then referenced both DP and TP examples in usage/distributed communication. Let me know if thats fine, I can revert it back if not.

@stefpi stefpi marked this pull request as ready for review January 15, 2026 18:07
@stefpi stefpi changed the title [WIP][Docs] Simple example of using MLX distributed [Docs] Simple example of using MLX distributed Jan 15, 2026
@stefpi stefpi requested a review from awni January 16, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants