Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter invocation, multi-step pipelines with guardrails, and checkpoint composition.
Step-by-step walkthroughs covering adapter invocation, pipeline construction, and model composition.
| Notebook | Topics | Duration | Colab |
|---|---|---|---|
| 00_hello_adapter.ipynb | Minimal adapter invocation with HuggingFace | 5 min | |
| 01_hello_mellea.ipynb | Mellea intrinsics intro with vLLM | 5 min | |
| 02_granite_switch_with_hf.ipynb | Compose + HuggingFace backend, adapter_name= invocation, Core + Guardian adapters in a multi-turn conversation |
10 min | |
| 03_01_govt_rag_pipeline_simple.ipynb | Simple RAG pipeline without guardians (rewrite, answerability, citations) | 30 min | |
| 03_02_govt_rag_pipeline_sequential.ipynb | Full RAG pipeline with guardian checks (harm + scope) | 30 min | |
| 03_03_govt_rag_pipeline_loops.ipynb | Complex RAG pipeline with retry loops for scope and answerability | 30 min | |
| 04_compose_granite_switch.ipynb | Compose a checkpoint from adapter libraries | 15 min | |
| 05_alora_vs_lora_race.ipynb | ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline | 20 min |
| Guide | Description |
|---|---|
| Using Mellea with Granite Switch | Connect Mellea to a Granite Switch model |
| Bring Your Own Adapter | Train, compose, and use custom adapters |
| Compare Inference Throughput | Compare LoRA vs aLoRA based models in an inference race setup |
Best for: Understanding how Granite Switch works at the control-token level
HuggingFace inference examples demonstrate how adapters are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
- Prerequisites
- Hello Adapter — see control tokens in action
- Granite Switch with HuggingFace — detailed walkthrough
Best for: All inference use cases — development through production
Mellea is the correct way to invoke Granite Switch capabilities. It handles constrained decoding, prompt rewriting, and input/output processing automatically. Currently supports vLLM; HuggingFace support coming soon.
- Prerequisites
- Hello Mellea
- RAG Pipeline — full RAG with ChromaDB
Before running inference, you need a composed Granite Switch model. Options:
- Use pre-composed models from HuggingFace (recommended for getting started)
- Compose your own — see Compose Your Checkpoint
Best for: Custom adapter development
Best for: Seeing how adapters compose into multi-step applications
- Simple RAG Pipeline — rewrite, answerability, citations
- Sequential RAG with Guardians — harm + scope checks
- RAG with Retry Loops — scope and answerability retries
Runnable scripts in scripts/ for common tasks:
| Script | Description |
|---|---|
| run_adapter_generation_direct.py | Direct adapter invocation via control tokens |
| run_adapter_generation_mellea.py | Adapter invocation through Mellea |
Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries. The three libraries below are featured throughout these tutorials:
| Adapter | Purpose | Where used in tutorials | HF repo |
|---|---|---|---|
| Core | Foundational post-generation intrinsics: certainty scoring, requirement checking, and response attribution. | 02, 04 | ibm-granite/granitelib-core-r1.0 |
| RAG | Retrieval-augmented generation intrinsics: query rewrite, answerability, hallucination detection, and citation generation. | 01, 03_01, 03_02, 04 | ibm-granite/granitelib-rag-r1.0 |
| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | 00, 01, 02, 03_02, 03_03, 04 | ibm-granite/granitelib-guardian-r1.0 |
| Resource | Description |
|---|---|
| Mellea | IBM's library for writing Generative Programs |
| Granite aLoRA Adapters | Official adapter libraries on HuggingFace |
| vLLM Documentation | High-performance inference |
| Granite Models | Base Granite models |
For technical details, see docs/:
- Supported Models — Model compatibility
- Git Workflow — Contribution guidelines