Skip to content

Commit f645626

Browse files
committed
1 parent 22e7bd9 commit f645626

File tree

13 files changed

+85
-32
lines changed

13 files changed

+85
-32
lines changed

latest/_sources/examples/amp.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ case, the losses) should preferably be scaled with a `GradScaler
1111
<https://pytorch.org/docs/stable/amp.html#gradient-scaling>`_ to avoid gradient underflow. The
1212
following example shows the resulting code for a multi-task learning use-case.
1313

14-
.. code-block:: python
14+
.. testcode::
1515
:emphasize-lines: 2, 17, 27, 34-35, 37-38
1616

1717
import torch

latest/_sources/examples/basic_usage.rst.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ the parameters are updated using the resulting aggregation.
1212

1313
Import several classes from ``torch`` and ``torchjd``:
1414

15-
.. code-block:: python
15+
.. testcode::
1616

1717
import torch
1818
from torch.nn import Linear, MSELoss, ReLU, Sequential
@@ -24,14 +24,14 @@ Import several classes from ``torch`` and ``torchjd``:
2424

2525
Define the model and the optimizer, as usual:
2626

27-
.. code-block:: python
27+
.. testcode::
2828

2929
model = Sequential(Linear(10, 5), ReLU(), Linear(5, 2))
3030
optimizer = SGD(model.parameters(), lr=0.1)
3131

3232
Define the aggregator that will be used to combine the Jacobian matrix:
3333

34-
.. code-block:: python
34+
.. testcode::
3535

3636
aggregator = UPGrad()
3737

@@ -41,7 +41,7 @@ negatively affected by the update.
4141

4242
Now that everything is defined, we can train the model. Define the input and the associated target:
4343

44-
.. code-block:: python
44+
.. testcode::
4545

4646
input = torch.randn(16, 10) # Batch of 16 random input vectors of length 10
4747
target1 = torch.randn(16) # First batch of 16 targets
@@ -51,7 +51,7 @@ Here, we generate fake inputs and labels for the sake of the example.
5151

5252
We can now compute the losses associated to each element of the batch.
5353

54-
.. code-block:: python
54+
.. testcode::
5555

5656
loss_fn = MSELoss()
5757
output = model(input)
@@ -62,7 +62,7 @@ The last steps are similar to gradient descent-based optimization, but using the
6262

6363
Perform the Jacobian descent backward pass:
6464

65-
.. code-block:: python
65+
.. testcode::
6666

6767
autojac.backward([loss1, loss2])
6868
jac_to_grad(model.parameters(), aggregator)
@@ -73,14 +73,14 @@ field of the parameters. It also deletes the ``.jac`` fields save some memory.
7373

7474
Update each parameter based on its ``.grad`` field, using the ``optimizer``:
7575

76-
.. code-block:: python
76+
.. testcode::
7777

7878
optimizer.step()
7979

8080
The model's parameters have been updated!
8181

8282
As usual, you should now reset the ``.grad`` field of each model parameter:
8383

84-
.. code-block:: python
84+
.. testcode::
8585

8686
optimizer.zero_grad()

latest/_sources/examples/iwmtl.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ this Gramian to reweight the gradients and resolve conflict entirely.
99

1010
The following example shows how to do that.
1111

12-
.. code-block:: python
12+
.. testcode::
1313
:emphasize-lines: 5-6, 18-20, 31-32, 34-35, 37-38, 40-41
1414

1515
import torch

latest/_sources/examples/iwrm.rst.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
4141
.. tab-set::
4242
.. tab-item:: autograd (baseline)
4343

44-
.. code-block:: python
44+
.. testcode::
4545

4646
import torch
4747
from torch.nn import Linear, MSELoss, ReLU, Sequential
@@ -75,7 +75,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
7575

7676
.. tab-item:: autojac
7777

78-
.. code-block:: python
78+
.. testcode::
7979
:emphasize-lines: 5-6, 12, 16, 21-23
8080

8181
import torch
@@ -110,7 +110,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
110110

111111
.. tab-item:: autogram (recommended)
112112

113-
.. code-block:: python
113+
.. testcode::
114114
:emphasize-lines: 5-6, 12, 16-17, 21-24
115115

116116
import torch

latest/_sources/examples/lightning_integration.rst.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,18 @@ The following code example demonstrates a basic multi-task learning setup using
1010
:class:`~lightning.pytorch.core.LightningModule` that will call :doc:`mtl_backward
1111
<../docs/autojac/mtl_backward>` at each training iteration.
1212

13-
.. code-block:: python
13+
.. testsetup::
14+
15+
import warnings
16+
import logging
17+
from lightning.fabric.utilities.warnings import PossibleUserWarning
18+
19+
logging.disable(logging.INFO)
20+
warnings.filterwarnings("ignore", category=DeprecationWarning)
21+
warnings.filterwarnings("ignore", category=FutureWarning)
22+
warnings.filterwarnings("ignore", category=PossibleUserWarning)
23+
24+
.. testcode::
1425
:emphasize-lines: 9-10, 18, 31-32
1526

1627
import torch

latest/_sources/examples/monitoring.rst.txt

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,12 @@ Jacobian descent is doing something different than gradient descent. With
1414
:doc:`UPGrad <../docs/aggregation/upgrad>`, this happens when the original gradients conflict (i.e.
1515
they have a negative inner product).
1616

17-
.. code-block:: python
17+
.. testsetup::
18+
19+
import torch
20+
torch.manual_seed(0)
21+
22+
.. testcode::
1823
:emphasize-lines: 9-11, 13-18, 33-34
1924

2025
import torch
@@ -67,3 +72,22 @@ they have a negative inner product).
6772
jac_to_grad(shared_module.parameters(), aggregator)
6873
optimizer.step()
6974
optimizer.zero_grad()
75+
76+
.. testoutput::
77+
78+
Weights: tensor([0.5000, 0.5000])
79+
Cosine similarity: 1.0000
80+
Weights: tensor([0.5000, 0.5000])
81+
Cosine similarity: 1.0000
82+
Weights: tensor([0.5000, 0.5000])
83+
Cosine similarity: 1.0000
84+
Weights: tensor([0.6618, 1.0554])
85+
Cosine similarity: 0.9249
86+
Weights: tensor([0.6569, 1.2146])
87+
Cosine similarity: 0.8661
88+
Weights: tensor([0.5004, 0.5060])
89+
Cosine similarity: 1.0000
90+
Weights: tensor([0.5000, 0.5000])
91+
Cosine similarity: 1.0000
92+
Weights: tensor([0.5746, 1.1607])
93+
Cosine similarity: 0.9301

latest/_sources/examples/mtl.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ For the sake of the example, we generate a fake dataset consisting of 8 batches
1818
vectors of dimension 10, and their corresponding scalar labels for both tasks.
1919

2020

21-
.. code-block:: python
21+
.. testcode::
2222
:emphasize-lines: 5-6, 19, 32-33
2323

2424
import torch

latest/_sources/examples/partial_jd.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ perform the partial descent by considering only the parameters of the last two `
1313
doing this, we avoid computing the Jacobian and its Gramian with respect to the parameters of the
1414
first ``Linear`` layer, thereby reducing memory usage and computation time.
1515

16-
.. code-block:: python
16+
.. testcode::
1717
:emphasize-lines: 16-18
1818

1919
import torch

latest/_sources/examples/rnn.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ When training recurrent neural networks for sequence modelling, we can easily ob
55
element of the output sequences. If the gradients of these losses are likely to conflict, Jacobian
66
descent can be leveraged to enhance optimization.
77

8-
.. code-block:: python
8+
.. testcode::
99
:emphasize-lines: 5-6, 10, 17, 19-20
1010

1111
import torch

latest/docs/autogram/engine/index.html

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@
295295
<h1>Engine<a class="headerlink" href="#engine" title="Link to this heading"></a></h1>
296296
<dl class="py class">
297297
<dt class="sig sig-object py" id="torchjd.autogram.Engine">
298-
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">torchjd.autogram.</span></span><span class="sig-name descname"><span class="pre">Engine</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">modules</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_dim</span></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L48-L338"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine" title="Link to this definition"></a></dt>
298+
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">torchjd.autogram.</span></span><span class="sig-name descname"><span class="pre">Engine</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">modules</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_dim</span></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L48-L344"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine" title="Link to this definition"></a></dt>
299299
<dd><p>Engine to compute the Gramian of the Jacobian of some tensor with respect to the direct
300300
parameters of all provided modules. It is based on Algorithm 3 of <a class="reference external" href="https://arxiv.org/pdf/2406.16232">Jacobian Descent For
301301
Multi-Objective Optimization</a> but goes even further:</p>
@@ -400,15 +400,15 @@ <h1>Engine<a class="headerlink" href="#engine" title="Link to this heading">¶</
400400
<p class="admonition-title">Warning</p>
401401
<p>Parent modules should call their child modules directly rather than using their child
402402
modules’ parameters themselves. For instance, the following model is not supported:</p>
403-
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">class</span><span class="w"> </span><span class="nc">Model</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
404-
<span class="gp">&gt;&gt;&gt; </span> <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
405-
<span class="gp">&gt;&gt;&gt; </span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
406-
<span class="gp">&gt;&gt;&gt; </span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># Child module</span>
407-
<span class="gp">&gt;&gt;&gt;</span>
408-
<span class="gp">&gt;&gt;&gt; </span> <span class="k">def</span><span class="w"> </span><span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="n">Tensor</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tensor</span><span class="p">:</span>
409-
<span class="gp">&gt;&gt;&gt; </span> <span class="c1"># Incorrect: Use the child module&#39;s parameters directly without calling it.</span>
410-
<span class="gp">&gt;&gt;&gt; </span> <span class="k">return</span> <span class="nb">input</span> <span class="o">@</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">weight</span><span class="o">.</span><span class="n">T</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">bias</span>
411-
<span class="gp">&gt;&gt;&gt; </span> <span class="c1"># Correct alternative: return self.linear(input)</span>
403+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">Model</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
404+
<span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
405+
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
406+
<span class="bp">self</span><span class="o">.</span><span class="n">linear</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># Child module</span>
407+
408+
<span class="k">def</span><span class="w"> </span><span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="n">Tensor</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tensor</span><span class="p">:</span>
409+
<span class="c1"># Incorrect: Use the child module&#39;s parameters directly without calling it.</span>
410+
<span class="k">return</span> <span class="nb">input</span> <span class="o">@</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">weight</span><span class="o">.</span><span class="n">T</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">bias</span>
411+
<span class="c1"># Correct alternative: return self.linear(input)</span>
412412
</pre></div>
413413
</div>
414414
</div>
@@ -421,7 +421,7 @@ <h1>Engine<a class="headerlink" href="#engine" title="Link to this heading">¶</
421421
</div>
422422
<dl class="py method">
423423
<dt class="sig sig-object py" id="torchjd.autogram.Engine.compute_gramian">
424-
<span class="sig-name descname"><span class="pre">compute_gramian</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output</span></span></em>, <em class="sig-param"><span class="positional-only-separator o"><abbr title="Positional-only parameter separator (PEP 570)"><span class="pre">/</span></abbr></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L238-L309"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine.compute_gramian" title="Link to this definition"></a></dt>
424+
<span class="sig-name descname"><span class="pre">compute_gramian</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output</span></span></em>, <em class="sig-param"><span class="positional-only-separator o"><abbr title="Positional-only parameter separator (PEP 570)"><span class="pre">/</span></abbr></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L244-L315"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine.compute_gramian" title="Link to this definition"></a></dt>
425425
<dd><p>Computes the Gramian of the Jacobian of <code class="docutils literal notranslate"><span class="pre">output</span></code> with respect to the direct parameters of
426426
all <code class="docutils literal notranslate"><span class="pre">modules</span></code>.</p>
427427
<dl class="field-list simple">

0 commit comments

Comments
 (0)