Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.

Commit a0e423d

Browse files
committed
adding high-level overview of ddpt's machinery
1 parent dfa512c commit a0e423d

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,26 @@ Please setup precommit hooks like this
7171
pre-commit install -f -c ./.pre-commit-config.yaml
7272
pre-commit autoupdate
7373
```
74+
75+
## Overview
76+
### Deferred Execution
77+
Typically, ddptensor operations do not get executed immediately. Instead, the function returns a transparent object (a future) only.
78+
the actual computation gets deferred by creating a promise/deferred object and queuing it for later. This is not visible to users, they can use it as any other numpy-like library.
79+
80+
Only when actual data is needed, computation will happen; that is when
81+
- the values of tensor elements are casted to bool int or float
82+
- the tensor is printed
83+
84+
In the background a worker thread handles deferred objects. Until computation is needed it dequeues deferred objects from the FIFO queue and asks them to generate MLIR.
85+
Objects can either generate MLIR or instead provide a run() function to immediately execute. For the latter case the current MLIR function gets executed before calling run() to make sure potential dependences are met.
86+
87+
### Distribution
88+
Tensors and operations on them get transparently distributed across multiple processes. Respective functionality is partly handled by this library and partly IMEX dist dialect.
89+
IMEX relies on a runtime library for complex communication tasks and for inspecting runtime configuration, such as number of processes and process id (MPI rank).
90+
ddptensor provides this library functionality in a separate dynamic library "idtr".
91+
92+
Right now, data is split in the first dimension (only). Each process knows the partition it owns. For optimization partitions can actually overlap.
93+
94+
ddptensor supports to execution modes:
95+
1. CSP/SPMD/explicitly-distributed execution, meaning all processes execute the same program, execution is replicated on all processes. Data is typically not replicated but distributed among processes.
96+
2. Controller-Worker/implicitly-distributed execution, meaning only a single process executes the program and it distributes data and work to worker processes.

0 commit comments

Comments
 (0)