-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Introduction
This issue, for discussion, broaches whether the WebNN system should be scalable to support incremental batch learning and/or fine-tuning.
A Motivating Use Case
A motivating use case for this proposal is adaptive instructional systems. Could end-users, one day, complete adaptive, personalized sequences of homework exercises using client-side, locally-stored user models capable of being updated locally with their performance data?
Incremental Batch Learning
The motivating use case invites exploration into incremental batch learning (Clearwater, Cheng, Hirsh, and Buchanan, 1989). End-users' user models could be updated daily, between homework-exercise sessions, instead of upon the completion of each homework exercise. End-users' homework performance data could be enqueued for their client-side user models to be trained on at times convenient, e.g., as end-users slept.
If graphs could be saved and loaded (see: #807), there would be more possible states for saved graphs. Graphs could be in a locked state during training. A saved graph could be locked from being loaded while it was undergoing training.
Additionally, some client-side, locally-stored graphs could be copied (e.g., from disk storage into RAM) before training, this ensuring the availability of a graph while a copy of it was trained.
Brainstorming, as end-users might have multiple, even many, stored graphs with training data enqueued for incremental batch learning, the timing of graphs' training could involve more intricate automated scheduling algorithms. As envisioned, scheduling and training would be coordinated by a background process, service, or daemon.
Catastropic Interference and Forgetting
Catastrophic interference, also known as catastrophic forgetting, is "the tendency of an artificial neural network to abruptly and drastically forget previously learned information upon learning new information".
Here are six approaches for addressing the challenge of catastrophic interference (van de Ven, Soures, and Kudithipudi, 2024) (see also: https://en.wikipedia.org/wiki/Catastrophic_interference#Proposed_solutions):
- Replay
Perhaps the most widely used approach for continual learning is replay. The idea behind replay, which is also referred to as rehearsal, is to approximate interleaved learning by complementing the training data of the current task or experience with data that are representative of previous ones. Replay has close links to neuroscience. In the brain, the re-occurence of neuronal activity patterns that represent previous experiences is believed to be important for the stabilization and consolidation of new memories.
- Parameter Regularization
Another popular approach for continual learning is parameter regularization. When a new task is learned, parameter regularization discourages large changes to parameters of the network that are thought to be important for previous tasks. From a neuroscience perspective, this approach can be linked to metaplasticity, as it can be interpreted as equipping the network parameters with an internal state that modulates their level of plasticity. Another motivation for parameter regularization comes from a Bayesian perspective, as instances of this approach can often be expressed or interpreted as performing sequential approximate Bayesian inference on the parameters of a neural network.
- Functional Regularization
An inherent difficulty with parameter regularization is that correctly estimating the importance of parameters for past tasks is very hard, which is due to the complex relation between the behaviour of a deep neural network and its parameters. Instead of operating in the parameter space, a more effective approach might be applying regularization in the function space of a neural network. The goal of such functional regularization is to prevent large changes to a network’s input-output mapping
$f_{\theta}$ at a set of specific inputs, which are termed 'anchor points'.
- Optimization-based Approaches
The three approaches discussed so far – replay, parameter regularization and functional regularization – operate by making changes to the loss function that is optimized. An alternative approach to continual learning is to change how the loss function is optimized. The standard optimization routines that are used in deep learning, such as stochastic gradient descent (SGD) and its variants, have been developed for stationary settings. In non-stationary settings, there are typically no guarantees for their behavior, yet these standard optimization routines are the default choice in most work on continual learning. However, in the last few years there has been an increasing attention in the continual learning literature for the role of optimization, and there have been several attempts to develop novel optimization routines specific for continual learning. It has even been argued that an wholistic solution for continual learning must consist of both changes to the loss function and changes to how that loss function is optimized.
- Context-dependent Processing
Another popular approach for continual learning is context-dependent processing. The idea behind this approach is to use certain parts of the network only for specific tasks or contexts, in order to reduce the interference that can occur between them. It is worth noting that when taken to the extreme, this approach corresponds to having a completely separate network per task or context. In this case, there would be no interference or forgetting at all, but there would also no longer be any possibility of positive transfer between tasks or contexts. It could therefore be argued that continual learning methods should aim to only segregate information that is unrelated to each other (as there is likely no positive transfer to be gained between them anyway), while storing related information in the same part of the network.
- Template-based Classification
An approach to class-incremental learning that is often used in continual learning is template-based classification. With this approach, a 'class template' is learned for every class, and classification is performed based on which class template is closest or most suitable for the sample to be classified. In this description, a class template can be thought of as a representation or a model of that particular class. In the context of class-incremental learning, an important advantage of template-based classification is that it avoids the need to make comparisons between classes during training. Standard softmax-based classifiers have to learn decision boundaries between all classes during their training, but this is challenging with class-incremental learning because not all classes are observed together. Template-based classifiers instead only have to learn a template per class during their training, and the comparison between classes is deferred to test time. Importantly, while the original problem is a class-incremental learning problem, learning these class templates is a task-incremental learning problem, whereby each ‘task’ is to learn a template for a specific class. This means that with this approach it is possible to use ‘template-specific components’, or other context-dependent processing approaches.
Fine-tuning
Exploration into memory and continual learning for language models via fine-tuning (Lin, Zettlemoyer, Ghosh, Yih, Markosyan, Berges, and Oğuz, 2025) is nascent.
From https://jessylin.com/2025/10/20/continual-learning/:
Continual learning and memory is a huge, rich space to explore, and at this point in 2025 we're only just starting to reach the point where we can imagine what it might look like to have models that learn online in the real world — the research is only just beginning.
Conclusion
I'm excited about the WebNN system and incremental batch learning and fine-tuning, these possibilities building on the potential capabilities to save and load graphs. As considered, a new background process, service, or daemon would enable incremental batch learning and/or fine-tuning, scheduling the training of models using enqueued batches of data, including after end-users closed or exited their browsers.
Thank you.
Bibliography
Clearwater, Scott H., Tze-Pin Cheng, Haym Hirsh, and Bruce G. Buchanan. "Incremental batch learning." In Proceedings of the Sixth International Workshop on Machine Learning, pp. 366-370. Morgan Kaufmann, 1989.
Lin, Jessy, Luke Zettlemoyer, Gargi Ghosh, Wen-Tau Yih, Aram Markosyan, Vincent-Pierre Berges, and Barlas Oğuz. "Continual learning via sparse memory finetuning." arXiv preprint arXiv:2510.15103 (2025).
van de Ven, Gido M., Nicholas Soures, and Dhireesha Kudithipudi. "Continual learning and catastrophic forgetting." arXiv preprint arXiv:2403.05175 (2024).