Granite Switch Tutorials

Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter invocation, multi-step pipelines with guardrails, and checkpoint composition.

Notebooks

Step-by-step walkthroughs covering adapter invocation, pipeline construction, and model composition.

Notebook	Topics	Duration
00_hello_adapter.ipynb	Minimal adapter invocation with HuggingFace	5 min
01_hello_mellea.ipynb	Mellea intrinsics intro with vLLM	5 min
02_granite_switch_with_hf.ipynb	Compose + HuggingFace backend, `adapter_name=` invocation, Core + Guardian adapters in a multi-turn conversation	10 min
03_01_govt_rag_pipeline_simple.ipynb	Simple RAG pipeline without guardians (rewrite, answerability, citations)	30 min
03_02_govt_rag_pipeline_sequential.ipynb	Full RAG pipeline with guardian checks (harm + scope)	30 min
03_03_govt_rag_pipeline_loops.ipynb	Complex RAG pipeline with retry loops for scope and answerability	30 min
04_compose_granite_switch.ipynb	Compose a checkpoint from adapter libraries	15 min
05_alora_vs_lora_race.ipynb	ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline	20 min

Guides

Guide	Description
Using Mellea with Granite Switch	Connect Mellea to a Granite Switch model
Bring Your Own Adapter	Train, compose, and use custom adapters
Compare Inference Throughput	Compare LoRA vs aLoRA based models in an inference race setup

Learning Paths

Path 1: Low-Level Understanding (HuggingFace)

Best for: Understanding how Granite Switch works at the control-token level

HuggingFace inference examples demonstrate how adapters are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).

Prerequisites
Hello Adapter — see control tokens in action
Granite Switch with HuggingFace — detailed walkthrough

Path 2: Inference with Mellea (Recommended)

Best for: All inference use cases — development through production

Mellea is the correct way to invoke Granite Switch capabilities. It handles constrained decoding, prompt rewriting, and input/output processing automatically. Currently supports vLLM; HuggingFace support coming soon.

Prerequisites
Hello Mellea
RAG Pipeline — full RAG with ChromaDB

Composing Models

Before running inference, you need a composed Granite Switch model. Options:

Use pre-composed models from HuggingFace (recommended for getting started)
Compose your own — see Compose Your Checkpoint

Path 3: Bring Your Own Adapter

Best for: Custom adapter development

Bring Your Own Adapter Guide

Path 4: Real-World Pipelines (Usability)

Best for: Seeing how adapters compose into multi-step applications

Simple RAG Pipeline — rewrite, answerability, citations
Sequential RAG with Guardians — harm + scope checks
RAG with Retry Loops — scope and answerability retries

Reference Scripts

Runnable scripts in scripts/ for common tasks:

Script	Description
run_adapter_generation_direct.py	Direct adapter invocation via control tokens
run_adapter_generation_mellea.py	Adapter invocation through Mellea

Adapter Libraries

Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries. The three libraries below are featured throughout these tutorials:

Adapter	Purpose	Where used in tutorials	HF repo
Core	Foundational post-generation intrinsics: certainty scoring, requirement checking, and response attribution.	02, 04	ibm-granite/granitelib-core-r1.0
RAG	Retrieval-augmented generation intrinsics: query rewrite, answerability, hallucination detection, and citation generation.	01, 03_01, 03_02, 04	ibm-granite/granitelib-rag-r1.0
Guardian	Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks.	00, 01, 02, 03_02, 03_03, 04	ibm-granite/granitelib-guardian-r1.0

External Resources

Resource	Description
Mellea	IBM's library for writing Generative Programs
Granite aLoRA Adapters	Official adapter libraries on HuggingFace
vLLM Documentation	High-performance inference
Granite Models	Base Granite models

Reference Documentation

For technical details, see docs/:

Supported Models — Model compatibility
Git Workflow — Contribution guidelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Granite Switch Tutorials

Notebooks

Guides

Learning Paths

Path 1: Low-Level Understanding (HuggingFace)

Path 2: Inference with Mellea (Recommended)

Composing Models

Path 3: Bring Your Own Adapter

Path 4: Real-World Pipelines (Usability)

Reference Scripts

Adapter Libraries

External Resources

Reference Documentation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Granite Switch Tutorials

Notebooks

Guides

Learning Paths

Path 1: Low-Level Understanding (HuggingFace)

Path 2: Inference with Mellea (Recommended)

Composing Models

Path 3: Bring Your Own Adapter

Path 4: Real-World Pipelines (Usability)

Reference Scripts

Adapter Libraries

External Resources

Reference Documentation