Skip to content

fahadsid1770/Small-Language-Model-with-Function-calling-capability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Custom 400M SLM with Function-Calling and ONNX Deployment

A Complete End-to-End Tutorial for Building, Training, and Deploying a Production-Ready Small Language Model

License: MIT Python PyTorch ONNX

🎯 Project Overview

This repository contains a complete, end-to-end blueprint for building, training, and deploying a custom ~400M parameter Small Language Model (SLM) from scratch. The model is trained to handle natural language instructions and execute Python function calls, with the final artifact being a production-ready ONNX model run by a lightweight Python agent.

✨ Core Features

  • 🏗️ Build from Scratch: Define and initialize a ~400M parameter Llama-style model with random weights using transformers
  • 📚 Two-Phase Training: Complete foundational pre-training (TinyStories) + Supervised Fine-Tuning (Oasst, Alpaca, Hermes-Function-Calling)
  • 🔧 Function-Calling: Trained to use the Hermes/ChatML format for tool use (<tools>, 英寸, <tool_response>)
  • ⚡ ONNX Export: Convert final trained PyTorch model to ONNX with proper KV cache handling using Hugging Face Optimum
  • 🤖 Local Agent: Production-ready agent.py script using onnxruntime-genai for ReAct-style function execution
  • 🚀 End-to-End: Complete pipeline from random weights to local, deployable function-calling AI

🚀 Quick Start

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended for training)
  • 16GB+ RAM
  • At least 50GB free disk space

Installation

# Clone the repository
git clone <repository-url>
cd folder-name

# Install dependencies
pip install -r requirements.txt

📋 Complete Tutorial Guide

This tutorial is divided into 5 sequential phases. Each phase builds upon the previous one and includes complete code implementations:

Phase 1: Initialize the Model

Duration: 5-10 minutes
Output: ./slm_from_scratch/ directory

Build the 400M parameter model configuration and initialize with random weights.

# Run the initialization script
python 01_initialize_model.py

What you'll learn:

  • Custom Llama-style architecture design
  • Parameter count calculation and optimization
  • Random weight initialization using AutoConfig

Phase 2: Create the SFT Dataset

Duration: 10-15 minutes
Output: ./unified_sft_dataset/ directory

Download, merge, and format the Supervised Fine-Tuning datasets.

# Create unified SFT dataset
python 02_create_sft_dataset.py

What you'll learn:

  • Dataset curation and merging strategies
  • Hermes function-calling format implementation
  • Data preprocessing for conversational AI

Phase 3: Train the Model

Duration: 6-24 hours (depending on hardware)
Output: ./slm_final_trained/ directory

Execute the complete training pipeline with both pre-training and SFT phases.

# Launch Jupyter notebook for training
jupyter notebook notebook/the_SLM.ipynb

What you'll learn:

  • Two-phase training methodology
  • PyTorch Trainer API usage
  • Data collator and tokenization strategies
  • Training optimization and monitoring

Phase 4: Export to ONNX

Duration: 10-20 minutes
Output: ./slm_onnx/ directory

Convert the trained PyTorch model to production-ready ONNX format.

# Export model to ONNX
python 04_export_to_onnx.py

What you'll learn:

  • Hugging Face Optimum integration
  • KV cache handling for generative models
  • Production deployment strategies

Phase 5: Run the Agent

Duration: 5 minutes
Output: Interactive function-calling agent

Start the local, ONNX-powered agent and interact with function calls.

# Launch the agent
python 05_run_agent.py

Example Interaction:

User: What's the weather like in Boston?
Agent: Let me check the weather for you.
<function_call>get_weather("Boston")</function_call>
Tool Response: Current weather in Boston: 72°F, sunny
Agent: The current weather in Boston is 72°F with sunny skies.

🏗️ Architecture & Design Decisions

Strategic Foundation

This project follows a "golden path" of technical decisions ensuring compatibility across all stages:

  1. Architecture: Custom Llama-style config via AutoConfig (ecosystem compatibility)
  2. Training: Two-phase approach (foundation → alignment)
  3. Export: Hugging Face Optimum (proper KV cache handling)
  4. Inference: onnxruntime-genai (specialized generative AI runtime)

Model Configuration

Parameter Value Rationale
Model Type Llama-style Transformer Modern architecture with RoPE and SwiGLU
Parameters ~400M Balance of capability vs. resource requirements
Layers 20 Transformer depth for good performance
Hidden Size 1280 Attention dimension
Heads 16 Multi-head attention
FFN Size 3584 SwiGLU feed-forward dimension

Training Strategy

Phase 1: Foundational Pre-training

  • Dataset: TinyStories (clean, simple language)
  • Objective: Learn grammar, syntax, basic world knowledge
  • Duration: Model must learn to form coherent text

Phase 2: Supervised Fine-Tuning

  • Datasets:
    • OpenAssistant/oasst1 (conversational)
    • tatsu-lab/alpaca (instruction-following)
    • NousResearch/hermes-function-calling-v1 (tool use)
  • Objective: Align for assistance + function calling
  • Format: Hermes/ChatML with structured tool calls

Function-Calling Format

The model learns to use this structured format:

System: <tools>get_weather(location) -> str</tools>
User: What's the weather like in Boston?

Agent: I'll check the weather for you.
<function_call>get_weather("Boston")</function_call>

User: <tool_response>Current weather in Boston: 72°F, sunny</tool_response>

Agent: The current weather in Boston is 72°F with sunny skies.

📁 Project Structure

SLM-training-ONNX-and-functional-calling/
├── README.md                           # This file
├── requirements.txt                    # Dependencies
├── 01_initialize_model.py             # Phase 1: Model initialization
├── 02_create_sft_dataset.py           # Phase 2: Dataset preparation
├── 03_training_pipeline.ipynb         # Phase 3: Training notebook
├── 04_export_to_onnx.py               # Phase 4: ONNX export
├── 05_run_agent.py                    # Phase 5: Agent deployment
├── readme info.md                     # Original technical documentation
└── notebook/
    └── the_SLM.ipynb                  # Interactive training notebook

🎓 Learning Objectives

By completing this tutorial, you will understand:

  • Model Architecture: How transformer models are constructed and parameterized
  • Training Methodology: Two-phase training from scratch to specialized assistant
  • Data Engineering: Curating and formatting datasets for different training phases
  • Production Deployment: Converting models to portable, high-performance formats
  • Function Calling: Implementing tool use in language models
  • Agent Design: Building reasoning loops for autonomous function execution

🛠️ Technical Implementation Details

Why This Architecture Works

  1. Ecosystem Compatibility: Using AutoConfig/AutoModel ensures compatibility with Trainer API and Optimum
  2. Proper Training Order: Foundation training before instruction following is essential
  3. ONNX Optimization: Optimum handles the complex KV cache export automatically
  4. Agent Efficiency: onnxruntime-genai provides high-performance generative inference

Key Technical Insights

  • Parameter Calculation: Total params = embeddings + attention + feedforward
  • KV Cache: Critical for efficient autoregressive generation
  • Format Consistency: Hermes format provides structured tool interaction
  • Export Complexity: Dynamic computation graphs require specialized tools

🔧 Troubleshooting

Common Issues

Training Issues

# Out of memory errors
- Reduce batch size in training config
- Use gradient accumulation
- Enable model parallelism if available

# Slow training
- Verify CUDA is available: torch.cuda.is_available()
- Check GPU utilization: nvidia-smi
- Optimize data loading with num_workers

ONNX Export Issues

# Unsupported operations
- Verify all operations are exportable to ONNX
- Check for dynamic shapes issues
- Ensure proper model configuration

# KV Cache errors
- Verify Optimum version compatibility
- Check model architecture compatibility
- Validate input/output shapes

Agent Runtime Issues

# Function call parsing errors
- Verify Hermes format compliance
- Check JSON structure in function calls
- Ensure proper tool definitions

# Performance issues
- Enable ONNX optimizations
- Check memory usage
- Verify GPU acceleration if available

🚀 Future Enhancements

Short-term Improvements

  • Constrained Decoding: Implement schema-constrained generation for reliable function calling
  • Quantization: Add INT8 quantization for faster inference
  • Model Scaling: Extend to 1B+ parameters using the provided architecture tables

Long-term Developments

  • Multi-modal Support: Extend to vision-language tasks
  • Distributed Training: Implement multi-GPU/multi-node training
  • Web Interface: Create Gradio/Streamlit UI for the agent
  • API Server: Build REST API for model serving

Advanced Features

  • Memory Optimization: Implement attention optimization techniques
  • Custom Functions: Add domain-specific tool libraries
  • Safety Filters: Implement content filtering and safety measures
  • Evaluation Suite: Comprehensive benchmarking framework

📚 Additional Resources

Recommended Reading

Tools & Libraries

🤝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Hugging Face Team - For the transformers and Optimum libraries
  • Microsoft - For ONNX Runtime and GenAI
  • EleutherAI - For training methodologies and datasets
  • The SLM Community - For insights and feedback

📞 Support

If you encounter any issues or have questions:

  1. Check the troubleshooting section above
  2. Search existing issues in the repository
  3. Create a new issue with detailed information
  4. Join the community discussions

Happy Learning! 🎉


This tutorial represents a complete, production-ready workflow for building custom language models. Each phase has been tested and optimized for reliability and educational value.

About

This repository contains a complete, end-to-end blueprint for building, training, and deploying a custom ~400M parameter Small Language Model (SLM) from scratch.

Topics

Resources

Stars

Watchers

Forks

Contributors