Skip to content

Latest commit

 

History

History
107 lines (71 loc) · 10.2 KB

File metadata and controls

107 lines (71 loc) · 10.2 KB

Granite Switch Tutorials

Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter invocation, multi-step pipelines with guardrails, and checkpoint composition.

Notebooks

Step-by-step walkthroughs covering adapter invocation, pipeline construction, and model composition.

Notebook Topics Duration Colab
00_hello_adapter.ipynb Minimal adapter invocation with HuggingFace 5 min Open In Colab
01_hello_mellea.ipynb Mellea intrinsics intro with vLLM 5 min Open In Colab
02_granite_switch_with_hf.ipynb Compose + HuggingFace backend, adapter_name= invocation, Core + Guardian adapters in a multi-turn conversation 10 min Open In Colab
03_01_govt_rag_pipeline_simple.ipynb Simple RAG pipeline without guardians (rewrite, answerability, citations) 30 min Open In Colab
03_02_govt_rag_pipeline_sequential.ipynb Full RAG pipeline with guardian checks (harm + scope) 30 min Open In Colab
03_03_govt_rag_pipeline_loops.ipynb Complex RAG pipeline with retry loops for scope and answerability 30 min Open In Colab
04_compose_granite_switch.ipynb Compose a checkpoint from adapter libraries 15 min Open In Colab
05_alora_vs_lora_race.ipynb ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline 20 min Open In Colab

Guides

Guide Description
Using Mellea with Granite Switch Connect Mellea to a Granite Switch model
Bring Your Own Adapter Train, compose, and use custom adapters
Compare Inference Throughput Compare LoRA vs aLoRA based models in an inference race setup

Learning Paths

Path 1: Low-Level Understanding (HuggingFace)

Best for: Understanding how Granite Switch works at the control-token level

HuggingFace inference examples demonstrate how adapters are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).

  1. Prerequisites
  2. Hello Adapter — see control tokens in action Open In Colab
  3. Granite Switch with HuggingFace — detailed walkthrough Open In Colab

Path 2: Inference with Mellea (Recommended)

Best for: All inference use cases — development through production

Mellea is the correct way to invoke Granite Switch capabilities. It handles constrained decoding, prompt rewriting, and input/output processing automatically. Currently supports vLLM; HuggingFace support coming soon.

  1. Prerequisites
  2. Hello Mellea Open In Colab
  3. RAG Pipeline — full RAG with ChromaDB Open In Colab

Composing Models

Before running inference, you need a composed Granite Switch model. Options:

  1. Use pre-composed models from HuggingFace (recommended for getting started)
  2. Compose your own — see Compose Your Checkpoint Open In Colab

Path 3: Bring Your Own Adapter

Best for: Custom adapter development

  1. Bring Your Own Adapter Guide

Path 4: Real-World Pipelines (Usability)

Best for: Seeing how adapters compose into multi-step applications

  1. Simple RAG Pipeline — rewrite, answerability, citations Open In Colab
  2. Sequential RAG with Guardians — harm + scope checks Open In Colab
  3. RAG with Retry Loops — scope and answerability retries Open In Colab

Reference Scripts

Runnable scripts in scripts/ for common tasks:

Script Description
run_adapter_generation_direct.py Direct adapter invocation via control tokens
run_adapter_generation_mellea.py Adapter invocation through Mellea

Adapter Libraries

Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries. The three libraries below are featured throughout these tutorials:

Adapter Purpose Where used in tutorials HF repo
Core Foundational post-generation intrinsics: certainty scoring, requirement checking, and response attribution. 02, 04 ibm-granite/granitelib-core-r1.0
RAG Retrieval-augmented generation intrinsics: query rewrite, answerability, hallucination detection, and citation generation. 01, 03_01, 03_02, 04 ibm-granite/granitelib-rag-r1.0
Guardian Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. 00, 01, 02, 03_02, 03_03, 04 ibm-granite/granitelib-guardian-r1.0

External Resources

Resource Description
Mellea IBM's library for writing Generative Programs
Granite aLoRA Adapters Official adapter libraries on HuggingFace
vLLM Documentation High-performance inference
Granite Models Base Granite models

Reference Documentation

For technical details, see docs/: