Skip to content

Hatice-Kocabas/LoRA-CodeExplainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

LoRA-CodeExplainer

LoRA Model

🚀 Project Overview

LoRA-CodeExplainer is a lightweight fine-tuning project that uses LoRA (Low-Rank Adaptation) to adapt a pretrained CodeT5-small model for code explanation tasks. The model takes a Python code snippet as input and generates a human-readable explanation in English.

This project is designed as an entry-level LoRA fine-tuning experiment with ~900 simple Python examples, making it perfect for learning LoRA-based LLM adaptations.


🧩 Dataset

  • Format: JSON
  • Each entry:
{
  "input": "def add(a, b):\n    return a + b",
  "output": "A function that adds two numbers."
}
  • Language: English
  • Examples: ~900 simple Python functions (math, strings, loops, lists, recursion, etc.)

⚙️ Usage

1. Install Requirements

pip install transformers peft datasets

2. Load Dataset

from datasets import load_dataset

dataset = load_dataset("json", data_files="dataset_en.json")

3. Load Pretrained Model and Tokenizer

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "Salesforce/codet5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

4. Apply LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_2_SEQ_LM"
)

model = get_peft_model(model, lora_config)

5. Train Model

Use Hugging Face Trainer or your preferred training loop:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./lora_codet5",
    per_device_train_batch_size=8,
    num_train_epochs=12,
    save_steps=500,
    logging_steps=50,
    learning_rate=3e-4,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"]
)

trainer.train()

🎯 Goals

  • Learn LoRA fine-tuning on a small dataset
  • Understand adapting pretrained LLMs for code tasks
  • Produce human-readable explanations of code snippets

📌 Notes

  • Dataset is intentionally simple to reduce training time
  • The project can be extended to larger datasets and more complex code
  • Fine-tuned model can be deployed for code summarization or code documentation generation

🔗 References


📝 License

MIT

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published