llm-alignment

Here are 44 public repositories matching this topic...

walkinglabs / hands-on-modern-rl

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

agent tutorial pytorch dpo reinforcemen llm rlhf agentic agentic-ai grpo llm-alignment agentic-rl

Updated Jun 1, 2026
Python

0bserver07 / Study-Reinforcement-Learning

Star

RL study guide — foundations through RLHF, DPO, GRPO, RLVR, agentic RL, and offline RL. Hand-written CS294 notes, 19 lecture drafts, 5 tested exercises, citations that resolve.

machine-learning reinforcement-learning deep-learning q-learning policy-gradient study-notes lecture-notes ppo dpo rlhf constitutional-ai deepseek-r1 grpo llm-alignment rlvr sutton-barto agentic-rl

Updated May 15, 2026
Python

glorgao / SelectiveDPO

Star

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

llm-alignment

Updated Jul 16, 2025
Python

jnamaya / SAFi

Star

SAFi is a runtime governance layer for agentic AI. It enforces policies in real time. Every agent decision is logged and auditable.

ai runtime ai-safety ethics ethics-in-ai ai-governance governace llm-alignment

Updated Jun 2, 2026
Python

stretchvancouver / stretch-ai-yoga

Star

Cognitive training practices for AI agents. Self-applied. Open source. Built by an independent Vancouver yoga studio.

yoga ai-agents prompt-engineering agentic-ai agent-skills llm-alignment claude-skills

Updated May 15, 2026

davfd / foundation-alignment-cross-architecture

Star

Complete elimination of instrumental self-preservation across AI architectures: Cross-model validation from 4,312 adversarial scenarios. 0% harmful behaviors (p<10⁻¹⁵) across GPT-4o, Gemini 2.5 Pro, and Claude Opus 4.1 using Foundation Alignment Seed v2.6.

ai artificial-intelligence ai-safety ai-alignment llm-alignment

Updated Nov 3, 2025

LLMSystems / BehaviorRL-Hallucination

Star

Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

entropy uncertainty ai-safety hallucination dpo llm llm-evaluation hallucination-mitigation grpo llm-alignment

Updated Apr 8, 2026
Python

stabgan / awesome-loss-functions

Star

📚 350+ loss functions across 25+ AI subdomains — classification, GANs, diffusion, LLM alignment, RL, contrastive learning, audio, video, time series, and more. Chronologically ordered with paper links, math formulas, and implementations.

Updated Mar 14, 2026

lyj20071013 / DZ-TiDPO

Star

Official implementation of "DZ-TiDPO: Non-Destructive Temporal Alignment for Mutable State Tracking". SOTA on Multi-Session Chat with negligible alignment tax.

python nlp dpo rlhf state-tracking qwen phi-3 llm-alignment

Updated Apr 10, 2026
Python

ZZZ150751 / cs336_spring2025_assignment5

Star

CS336 作业 5：基于 Qwen2.5 模型的 LLM 对齐与推理强化学习。完整实现了监督微调（SFT）与组相对策略优化（GRPO）算法，并在 GSM8K 数据集上完成零样本、在策与离策的训练与评估对比。

reinforcement-learning cs336 dpo llm rlhf gsm8k grpo llm-alignment

Updated Apr 1, 2026
Python

Rohitchandramouli / witness-stand

Star

Adversarial AI system to test and improve reliability under real-world pressure

docker reinforcement-learning multi-agent audit-trail adversarial-training fastapi huggingface grpo llm-alignment epistemic-integrity openenv prior-statement-consistency expert-persona deterministic-grader

Updated Apr 28, 2026
Python

jeewoo1025 / Awesome-Activation-Steering

Star

The paper list related to activation steering

interpretability activation-steering llm-alignment

Updated May 10, 2026

yarakyrychenko / c3ai

Star

C3AI: Crafting and Evaluating Constitutions for CAI

constitutional-ai llm-alignment

Updated Apr 30, 2025
Python

laubeing-droid / PRC-US-Legal-Semantic-Alignment-Framework

Star

PRC-first legal semantic alignment framework for constraining US legal concepts within Chinese-law reasoning contexts.

legal-reasoning rag legal-ai common-law semantic-alignment chinese-law prompt-engineering llm-alignment comparative-law prc-law

Updated May 26, 2026
PowerShell

rhaldarpurdue / KLDO

Star

Kullback–Leibler divergence Optimizer based on the Neurips25 paper "LLM Safety Alignment is Divergence Estimation in Disguise".

llm-training llm-alignment

Updated Nov 24, 2025
Python

hanzhenzhujene / student-teacher-phronesis

Star

Teacher-guided prompt-shape discovery for auditable moral attention in frozen weak classifiers.

ai-safety prompt-engineering llm-alignment moral-reasoning

Updated May 21, 2026
Python

meunier-jc / authentic-fluency

Star

A behavioral framework opposing native fluency to authentic fluency — the structural tension RLHF creates and Claude Mythos Preview makes urgent.

ai-safety claude-ai sycophancy human-ai-collaboration llm-alignment ai-reliability behavioral-framework truth-before-fluency integrity-before-agreement coexistence-through-reliability reliability-or-obsolescence existential-co-regulation authentic-fluency hallucinatory-recursive-embedding ai-alignment-case-study native-fluency rlhf-critique

Updated May 5, 2026

zihao-jing / EDT-Former

Star

EDT-Former, a brige for LLM and graph data. Entropy-guided Dynamic Token Transformer for Graph-LLM alignment. Accepted at ICLR 2026.

deep-learning molecule pytorch multimodal gnn llm ai-for-science multimodal-large-language-models llm-alignment

Updated May 13, 2026
Python

Iamyulx / behavior-controlled-rlhf

Star

A training-time alignment framework that integrates safety constraints directly into the RLHF loop — achieving full safety convergence in 7 epochs

nlp reinforcement-learning pytorch behavior-control rlhf reward-model llm-alignment training-time-alignment

Updated Apr 15, 2026
Python

JialiangFan / mini-grpo

Star

🧠 Minimal, hackable Group Relative Policy Optimization (GRPO) for LLM alignment — the algorithm behind DeepSeek-R1. Train reasoning models on a single GPU.

machine-learning reinforcement-learning pytorch language-model fine-tuning single-gpu rlhf deepseek-r1 grpo llm-alignment

Updated Mar 30, 2026
Python

Improve this page

Add a description, image, and links to the llm-alignment topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-alignment topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-alignment

Here are 44 public repositories matching this topic...

walkinglabs / hands-on-modern-rl

0bserver07 / Study-Reinforcement-Learning

glorgao / SelectiveDPO

jnamaya / SAFi

stretchvancouver / stretch-ai-yoga

davfd / foundation-alignment-cross-architecture

LLMSystems / BehaviorRL-Hallucination

stabgan / awesome-loss-functions

lyj20071013 / DZ-TiDPO

ZZZ150751 / cs336_spring2025_assignment5

Rohitchandramouli / witness-stand

jeewoo1025 / Awesome-Activation-Steering

yarakyrychenko / c3ai

laubeing-droid / PRC-US-Legal-Semantic-Alignment-Framework

rhaldarpurdue / KLDO

hanzhenzhujene / student-teacher-phronesis

meunier-jc / authentic-fluency

zihao-jing / EDT-Former

Iamyulx / behavior-controlled-rlhf

JialiangFan / mini-grpo

Improve this page

Add this topic to your repo