Skip to content

Strong-AI-Lab/neural_networks_solomonoff_induction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Universal Predictors

Fork Notice: This is a fork of google-deepmind/neural_networks_solomonoff_induction maintained at Strong-AI-Lab/neural_networks_solomonoff_induction. See Changes from Original for details.

This repository provides an implementation of the ICML 2024 paper Learning Universal Predictors.

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction(SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. We provide theoretical analysis of the UTM data generation processes and meta-training protocols. We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.

It is based on JAX and Haiku and contains all code, datasets, and models necessary to reproduce the paper's results.

Content

.
├── data
|   ├── chomsky_data_generator.py - Chomsky Task Source, for a single task.
|   ├── ctw_data_generator.py     - Variable-order Markov Source.
|   ├── data_generator.py         - Main abstract class for our data generators.
|   ├── meta_data_generator.py    - Sampling from multiple generators.
|   ├── utm_data_generator.py     - BrainPhoque UTM Source, from randomly sampled programs.
|   └── utms.py                   - UTM interface and implementation of BrainPhoque.
├── models
|   ├── ctw.py                    - CTW (Willems, 1995)
|   └── transformer.py            - Decoder-only Transformer (Vaswani, 2017). [modified]
├── evaluate.py                   - Script to evaluate a trained model. [added]
├── README.md
├── requirements.txt              - Dependencies
└── train.py                      - Script to train a neural model. [modified]

Installation

Clone the source code into a local directory:

git clone https://github.com/google-deepmind/neural_networks_solomonoff_induction.git
cd neural_networks_solomonoff_induction

pip install -r requirements.txt will install all required dependencies. This is best done inside a conda environment. To that end, install Anaconda. Then, create and activate the conda environment:

conda create --name nnsi
conda activate nnsi

Install pip and use it to install all the dependencies:

conda install pip
pip install -r requirements.txt

If you have a GPU available (highly recommended for fast training), then you can install JAX with CUDA support.

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Note that the jax version must correspond to the existing CUDA installation you wish to use (CUDA 12 in the example above). Please see the JAX documentation for more details.

Chomsky Tasks

This repository relies on code from https://github.com/google-deepmind/neural_networks_chomsky_hierarchy. To that end, we clone the repository and install its dependencies: To clone that repository, run

git clone https://github.com/google-deepmind/neural_networks_chomsky_hierarchy.git
cd neural_networks_chomsky_hierarchy
pip install -r requirements.txt
cd ..

Usage

Before running any code, make sure to activate the conda environment and set the PYTHONPATH:

conda activate nnsi
export PYTHONPATH=$(pwd)/..

We provide an example training script at train.py, which can be run with

python train.py

To switch positional encoding while keeping the rest of the setup the same:

# Baseline from the original code/paper setup (absolute sinusoidal encoding)
python train.py --position_encoding_type=sinusoidal

# Relative positional version (causal T5-style relative attention bias)
python train.py --position_encoding_type=relative_bias

Useful relative-bias tuning flags:

--relative_attention_num_buckets=32
--relative_attention_max_distance=128

The exact hyperparameters used to reproduce our results can be found in Table 1 and Appendix D of the paper.

To evaluate a trained model, use evaluate.py:

python evaluate.py --params_path=params.npz

Key flags:

--data_source          # utm (default), ctw, or chomsky
--eval_seq_lengths     # comma-separated sequence lengths, e.g. 256,1024
--num_eval_sequences   # total sequences per length (default: 6000)
--position_encoding_type  # sinusoidal (default) or relative_bias
--compute_ctw_regret   # if True, also evaluate CTW and report regret
--position_metrics_csv # optional path to write per-position metrics CSV

The model architecture flags (--embedding_dim, --num_layers, --num_heads, etc.) must match those used during training.

Changes from Original

The following changes have been made relative to google-deepmind/neural_networks_solomonoff_induction:

  • models/transformer.py: Added support for causal T5-style relative attention bias as an alternative to the original sinusoidal positional encoding, selectable via --position_encoding_type=relative_bias.
  • train.py: Refactored with additional flags to support the new positional encoding options.
  • evaluate.py (new): Script to evaluate a trained model on UTM, CTW, or Chomsky sequences. Reports per-position NLL and optionally computes regret against a CTW baseline.
  • data/solomonoff.py (new): SolomonoffInductor class that estimates the Solomonoff mixture distribution over next symbols using the BrainPhoque UTM.

Citing This Work

@inproceedings{grau2024learning,
  author       = {Jordi Grau{-}Moya and
                  Tim Genewein and
                  Marcus Hutter and
                  Laurent Orseau and
                  Gr{\'{e}}goire Del{\'{e}}tang and
                  Elliot Catt and
                  Anian Ruoss and
                  Li Kevin Wenliang and
                  Christopher Mattern and
                  Matthew Aitchison and
                  Joel Veness},
  title        = {Learning Universal Predictors},
  booktitle    = {International Conference on Machine Learning},
  year         = {2024},
}

License and Disclaimer

Copyright 2023 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

About

Learning Universal Predictors

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%