This folder is a standalone EmbBERT bundle centered on the verified
checkpoint-616000 pretraining artifact.
It includes:
checkpoints/pretraining/checkpoint-616000/tokenizers/bpe_book_corpus_8192.jsonconfigs/EmbBERT_config.jsonmanifest.jsonandloaders.pylib/- runnable scripts such as
pretrain.py,finetune.py,quantize.py,evaluate_embbert_bundle.py, andembbert_semantic_search_test.py pyproject.tomlanduv.lockdatasets/embbert_semantic_search_benchmark.json
Visualization utilities are intentionally excluded from this bundle. In
particular, plotting helpers such as graphing.py and loss_plotter.py are
not included.
Run from inside this directory:
from loaders import load_latest_pretraining_checkpoint
model, tokenizer = load_latest_pretraining_checkpoint()