deploy: SimplexLab/TorchJD@76941a1

ValerianRey · ValerianRey · commit f645626f59fd · 2026-03-08T11:25:27.000Z
diff --git a/latest/_sources/examples/amp.rst.txt b/latest/_sources/examples/amp.rst.txt
@@ -11,7 +11,7 @@ case, the losses) should preferably be scaled with a `GradScaler
 <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`_ to avoid gradient underflow. The
 following example shows the resulting code for a multi-task learning use-case.
 
-.. code-block:: python
+.. testcode::
     :emphasize-lines: 2, 17, 27, 34-35, 37-38
 
     import torch
diff --git a/latest/_sources/examples/basic_usage.rst.txt b/latest/_sources/examples/basic_usage.rst.txt
@@ -12,7 +12,7 @@ the parameters are updated using the resulting aggregation.
 
 Import several classes from ``torch`` and ``torchjd``:
 
-.. code-block:: python
+.. testcode::
 
     import torch
     from torch.nn import Linear, MSELoss, ReLU, Sequential
@@ -24,14 +24,14 @@ Import several classes from ``torch`` and ``torchjd``:
 
 Define the model and the optimizer, as usual:
 
-.. code-block:: python
+.. testcode::
 
     model = Sequential(Linear(10, 5), ReLU(), Linear(5, 2))
     optimizer = SGD(model.parameters(), lr=0.1)
 
 Define the aggregator that will be used to combine the Jacobian matrix:
 
-.. code-block:: python
+.. testcode::
 
     aggregator = UPGrad()
 
@@ -41,7 +41,7 @@ negatively affected by the update.
 
 Now that everything is defined, we can train the model. Define the input and the associated target:
 
-.. code-block:: python
+.. testcode::
 
     input = torch.randn(16, 10)  # Batch of 16 random input vectors of length 10
     target1 = torch.randn(16)  # First batch of 16 targets
@@ -51,7 +51,7 @@ Here, we generate fake inputs and labels for the sake of the example.
 
 We can now compute the losses associated to each element of the batch.
 
-.. code-block:: python
+.. testcode::
 
     loss_fn = MSELoss()
     output = model(input)
@@ -62,7 +62,7 @@ The last steps are similar to gradient descent-based optimization, but using the
 
 Perform the Jacobian descent backward pass:
 
-.. code-block:: python
+.. testcode::
 
     autojac.backward([loss1, loss2])
     jac_to_grad(model.parameters(), aggregator)
@@ -73,14 +73,14 @@ field of the parameters. It also deletes the ``.jac`` fields save some memory.
 
 Update each parameter based on its ``.grad`` field, using the ``optimizer``:
 
-.. code-block:: python
+.. testcode::
 
     optimizer.step()
 
 The model's parameters have been updated!
 
 As usual, you should now reset the ``.grad`` field of each model parameter:
 
-.. code-block:: python
+.. testcode::
 
     optimizer.zero_grad()
diff --git a/latest/_sources/examples/iwmtl.rst.txt b/latest/_sources/examples/iwmtl.rst.txt
@@ -9,7 +9,7 @@ this Gramian to reweight the gradients and resolve conflict entirely.
 
 The following example shows how to do that.
 
-.. code-block:: python
+.. testcode::
     :emphasize-lines: 5-6, 18-20, 31-32, 34-35, 37-38, 40-41
 
     import torch
diff --git a/latest/_sources/examples/iwrm.rst.txt b/latest/_sources/examples/iwrm.rst.txt
@@ -41,7 +41,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
 .. tab-set::
     .. tab-item:: autograd (baseline)
 
-        .. code-block:: python
+        .. testcode::
 
             import torch
             from torch.nn import Linear, MSELoss, ReLU, Sequential
@@ -75,7 +75,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
 
     .. tab-item:: autojac
 
-        .. code-block:: python
+        .. testcode::
             :emphasize-lines: 5-6, 12, 16, 21-23
 
             import torch
@@ -110,7 +110,7 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
 
     .. tab-item:: autogram (recommended)
 
-        .. code-block:: python
+        .. testcode::
             :emphasize-lines: 5-6, 12, 16-17, 21-24
 
             import torch
diff --git a/latest/_sources/examples/lightning_integration.rst.txt b/latest/_sources/examples/lightning_integration.rst.txt
@@ -10,7 +10,18 @@ The following code example demonstrates a basic multi-task learning setup using
 :class:`~lightning.pytorch.core.LightningModule` that will call :doc:`mtl_backward
 <../docs/autojac/mtl_backward>` at each training iteration.
 
-.. code-block:: python
+.. testsetup::
+
+    import warnings
+    import logging
+    from lightning.fabric.utilities.warnings import PossibleUserWarning
+
+    logging.disable(logging.INFO)
+    warnings.filterwarnings("ignore", category=DeprecationWarning)
+    warnings.filterwarnings("ignore", category=FutureWarning)
+    warnings.filterwarnings("ignore", category=PossibleUserWarning)
+
+.. testcode::
     :emphasize-lines: 9-10, 18, 31-32
 
     import torch
diff --git a/latest/_sources/examples/monitoring.rst.txt b/latest/_sources/examples/monitoring.rst.txt
@@ -14,7 +14,12 @@ Jacobian descent is doing something different than gradient descent. With
 :doc:`UPGrad <../docs/aggregation/upgrad>`, this happens when the original gradients conflict (i.e.
 they have a negative inner product).
 
-.. code-block:: python
+.. testsetup::
+
+    import torch
+    torch.manual_seed(0)
+
+.. testcode::
     :emphasize-lines: 9-11, 13-18, 33-34
 
     import torch
@@ -67,3 +72,22 @@ they have a negative inner product).
         jac_to_grad(shared_module.parameters(), aggregator)
         optimizer.step()
         optimizer.zero_grad()
+
+.. testoutput::
+
+    Weights: tensor([0.5000, 0.5000])
+    Cosine similarity: 1.0000
+    Weights: tensor([0.5000, 0.5000])
+    Cosine similarity: 1.0000
+    Weights: tensor([0.5000, 0.5000])
+    Cosine similarity: 1.0000
+    Weights: tensor([0.6618, 1.0554])
+    Cosine similarity: 0.9249
+    Weights: tensor([0.6569, 1.2146])
+    Cosine similarity: 0.8661
+    Weights: tensor([0.5004, 0.5060])
+    Cosine similarity: 1.0000
+    Weights: tensor([0.5000, 0.5000])
+    Cosine similarity: 1.0000
+    Weights: tensor([0.5746, 1.1607])
+    Cosine similarity: 0.9301
diff --git a/latest/_sources/examples/mtl.rst.txt b/latest/_sources/examples/mtl.rst.txt
@@ -18,7 +18,7 @@ For the sake of the example, we generate a fake dataset consisting of 8 batches
 vectors of dimension 10, and their corresponding scalar labels for both tasks.
 
 
-.. code-block:: python
+.. testcode::
     :emphasize-lines: 5-6, 19, 32-33
 
     import torch
diff --git a/latest/_sources/examples/partial_jd.rst.txt b/latest/_sources/examples/partial_jd.rst.txt
@@ -13,7 +13,7 @@ perform the partial descent by considering only the parameters of the last two `
 doing this, we avoid computing the Jacobian and its Gramian with respect to the parameters of the
 first ``Linear`` layer, thereby reducing memory usage and computation time.
 
-.. code-block:: python
+.. testcode::
     :emphasize-lines: 16-18
 
     import torch
diff --git a/latest/_sources/examples/rnn.rst.txt b/latest/_sources/examples/rnn.rst.txt
@@ -5,7 +5,7 @@ When training recurrent neural networks for sequence modelling, we can easily ob
 element of the output sequences. If the gradients of these losses are likely to conflict, Jacobian
 descent can be leveraged to enhance optimization.
 
-.. code-block:: python
+.. testcode::
     :emphasize-lines: 5-6, 10, 17, 19-20
 
     import torch
diff --git a/latest/docs/autogram/engine/index.html b/latest/docs/autogram/engine/index.html
@@ -295,7 +295,7 @@
 <h1>Engine<a class="headerlink" href="#engine" title="Link to this heading">¶</a></h1>
 <dl class="py class">
 <dt class="sig sig-object py" id="torchjd.autogram.Engine">
-<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">torchjd.autogram.</span></span><span class="sig-name descname"><span class="pre">Engine</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">modules</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_dim</span></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L48-L338"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine" title="Link to this definition">¶</a></dt>
+<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">torchjd.autogram.</span></span><span class="sig-name descname"><span class="pre">Engine</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">modules</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_dim</span></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L48-L344"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine" title="Link to this definition">¶</a></dt>
 <dd><p>Engine to compute the Gramian of the Jacobian of some tensor with respect to the direct
 parameters of all provided modules. It is based on Algorithm 3 of <a class="reference external" href="https://arxiv.org/pdf/2406.16232">Jacobian Descent For
 Multi-Objective Optimization</a> but goes even further:</p>
@@ -400,15 +400,15 @@ <h1>Engine<a class="headerlink" href="#engine" title="Link to this heading">¶</
 <p class="admonition-title">Warning</p>
 <p>Parent modules should call their child modules directly rather than using their child
 modules’ parameters themselves. For instance, the following model is not supported:</p>
-<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">class</span><span class="w"> </span><span class="nc">Model</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
-<span class="gp">&gt;&gt;&gt; </span>    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
-<span class="gp">&gt;&gt;&gt; </span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
-<span class="gp">&gt;&gt;&gt; </span>        <span class="bp">self</span><span class="o">.</span><span class="n">linear</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>  <span class="c1"># Child module</span>
-<span class="gp">&gt;&gt;&gt;</span>
-<span class="gp">&gt;&gt;&gt; </span>    <span class="k">def</span><span class="w"> </span><span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="n">Tensor</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tensor</span><span class="p">:</span>
-<span class="gp">&gt;&gt;&gt; </span>        <span class="c1"># Incorrect: Use the child module&#39;s parameters directly without calling it.</span>
-<span class="gp">&gt;&gt;&gt; </span>        <span class="k">return</span> <span class="nb">input</span> <span class="o">@</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">weight</span><span class="o">.</span><span class="n">T</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">bias</span>
-<span class="gp">&gt;&gt;&gt; </span>        <span class="c1"># Correct alternative: return self.linear(input)</span>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">Model</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
+    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
+        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">linear</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>  <span class="c1"># Child module</span>
+
+    <span class="k">def</span><span class="w"> </span><span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="n">Tensor</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tensor</span><span class="p">:</span>
+        <span class="c1"># Incorrect: Use the child module&#39;s parameters directly without calling it.</span>
+        <span class="k">return</span> <span class="nb">input</span> <span class="o">@</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">weight</span><span class="o">.</span><span class="n">T</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">linear</span><span class="o">.</span><span class="n">bias</span>
+        <span class="c1"># Correct alternative: return self.linear(input)</span>
 </pre></div>
 </div>
 </div>
@@ -421,7 +421,7 @@ <h1>Engine<a class="headerlink" href="#engine" title="Link to this heading">¶</
 </div>
 <dl class="py method">
 <dt class="sig sig-object py" id="torchjd.autogram.Engine.compute_gramian">
-<span class="sig-name descname"><span class="pre">compute_gramian</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output</span></span></em>, <em class="sig-param"><span class="positional-only-separator o"><abbr title="Positional-only parameter separator (PEP 570)"><span class="pre">/</span></abbr></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L238-L309"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine.compute_gramian" title="Link to this definition">¶</a></dt>
+<span class="sig-name descname"><span class="pre">compute_gramian</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output</span></span></em>, <em class="sig-param"><span class="positional-only-separator o"><abbr title="Positional-only parameter separator (PEP 570)"><span class="pre">/</span></abbr></span></em><span class="sig-paren">)</span><a class="reference external" href="https://github.com/SimplexLab/TorchJD/blob/main/src/torchjd/autogram/_engine.py#L244-L315"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchjd.autogram.Engine.compute_gramian" title="Link to this definition">¶</a></dt>
 <dd><p>Computes the Gramian of the Jacobian of <code class="docutils literal notranslate"><span class="pre">output</span></code> with respect to the direct parameters of
 all <code class="docutils literal notranslate"><span class="pre">modules</span></code>.</p>
 <dl class="field-list simple">
diff --git a/latest/docs/autojac/jac/index.html b/latest/docs/autojac/jac/index.html
@@ -347,8 +347,8 @@ <h1>jac<a class="headerlink" href="#jac" title="Link to this heading">¶</a></h1
 <span class="gp">&gt;&gt;&gt; </span><span class="n">jacobians</span> <span class="o">=</span> <span class="n">jac</span><span class="p">([</span><span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">],</span> <span class="n">param</span><span class="p">)</span>
 <span class="gp">&gt;&gt;&gt;</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">jacobians</span>
-<span class="go">(tensor([[-1., 1.],</span>
-<span class="go">        [ 2., 4.]]),)</span>
+<span class="go">(tensor([[-1.,  1.],</span>
+<span class="go">        [ 2.,  4.]]),)</span>
 </pre></div>
 </div>
 </div>
diff --git a/latest/docs/autojac/jac_to_grad/index.html b/latest/docs/autojac/jac_to_grad/index.html
@@ -361,7 +361,7 @@ <h1>jac_to_grad<a class="headerlink" href="#jac-to-grad" title="Link to this hea
 <span class="gp">&gt;&gt;&gt; </span><span class="n">param</span><span class="o">.</span><span class="n">grad</span>
 <span class="go">tensor([0.5000, 2.5000])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">weights</span>
-<span class="go">tensor([0.5,  0.5])</span>
+<span class="go">tensor([0.5000, 0.5000])</span>
 </pre></div>
 </div>
 <p>The <code class="docutils literal notranslate"><span class="pre">.grad</span></code> field of <code class="docutils literal notranslate"><span class="pre">param</span></code> now contains the aggregation (by UPGrad) of the Jacobian of
diff --git a/latest/examples/monitoring/index.html b/latest/examples/monitoring/index.html
@@ -356,6 +356,24 @@ <h1>Monitoring aggregations<a class="headerlink" href="#monitoring-aggregations"
     <span class="n">optimizer</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
 </pre></div>
 </div>
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>Weights: tensor([0.5000, 0.5000])
+Cosine similarity: 1.0000
+Weights: tensor([0.5000, 0.5000])
+Cosine similarity: 1.0000
+Weights: tensor([0.5000, 0.5000])
+Cosine similarity: 1.0000
+Weights: tensor([0.6618, 1.0554])
+Cosine similarity: 0.9249
+Weights: tensor([0.6569, 1.2146])
+Cosine similarity: 0.8661
+Weights: tensor([0.5004, 0.5060])
+Cosine similarity: 1.0000
+Weights: tensor([0.5000, 0.5000])
+Cosine similarity: 1.0000
+Weights: tensor([0.5746, 1.1607])
+Cosine similarity: 0.9301
+</pre></div>
+</div>
 </section>
 
         </article>