Skip to content

kdmsit/crysllmgen

Repository files navigation

LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation. (NeurIPS 2025)

arXiv Code

This repository contains the official code release for our ICLR 2025 paper "LLM Meets Diffusion - A Hybrid Framework for Crystal Material Generation", by Subhojyoti Khastagir*, Kishalay Das*, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly.

CrysLLMGen introduces a hybrid approach to generating 3D structure of crystal materials. Key contributions of CrysLLMGen are:

  • Hybrid LLM + Diffusion Framework: Integrates LLMs for discrete predictions with equivariant diffusion models for continuous structural refinement.
  • Two-Stage Generation: LLM proposes atom types, coordinates, and lattice; diffusion model refines them for stability and physical validity.
  • Constraint-Aware Design: Supports conditional generation based on user-defined composition, space group, and natural-language prompts.
  • Balanced Validity & Novelty: Achieves superior stability, structural correctness, and compositional validity compared to standalone LLMs or diffusion models.
  • Architecture-Agnostic: Framework can seamlessly incorporate future LLMs and denoising architectures.

Installation

The list of dependencies is provided in the requirements.txt file, generated using pipreqs. You can install through the following commands:

pip install -r requirements.txt

However, there may be some ad-hoc dependencies that were not captured. If you encounter any missing packages, feel free to install them manually using pip install.

Usage Guide

Training Pipeline

Step 1: Fine-tune LLaMA-2 Model

For Perov-5

python -W ignore llm_finetune.py \
--run-name 7b-perov \
--model 7b \
--num-epochs 1 \
--data-path data/perov_5

For MP-20

python -W ignore llm_finetune.py \
--run-name 7b-mp \
--model 7b \
--num-epochs 1 \
--data-path data/mp_20

Output: The fine-tuned LLM will be saved in:

  • exp/7b-perov/ (Perov-5)
  • exp/7b-mp/ (MP-20)

Step 2: Train the Diffusion Model

For Perov-5

python -W ignore diff_train.py \
--dataset perov_5 \
--batch_size 512 \
--epochs 500 \
--timesteps 1000 \
--run-type train

For MP-20

python -W ignore diff_train.py \
--dataset mp_20 \
--batch_size 512 \
--epochs 500 \
--timesteps 1000 \
--run-type train

Output: The trained diffusion model will be saved at:

out/<Dataset>/<expt_date>/<expt_time>/

Where <Dataset> is either perov_5 or mp_20.


Unconditional Sampling (Batch-wise) from CrysLLMGen

Use the correct --model_path and --diff_steps depending on the dataset.

python -W ignore crysllmgen_sample.py \
--model_name 7b \
--model_path <LLM_CHECKPOINT_PATH> \
--chkpt_name <DIFFUSION_CHECKPOINT_PATH> \
--num_samples 10000 \
--dataset < mp | perov> \
--temperature 1.0 \
--top_p 0.7 \
--diff_steps <700 | 800> \
--run-type sample \
--out-prefix "Crysllmgen_sample" \
--batch_size 128

Replace the Following Based on Dataset

For MP-20

  • --model_path exp/7b-mp/checkpoint-27136
  • --dataset mp
  • --diff_steps 800

For Perov-5

  • --model_path exp/7b-perov/checkpoint-11356
  • --dataset perov
  • --diff_steps 700

Output Files

Generated samples are saved as .pt files:

  • Crysllmgen_sample_mp_10000.pt
  • Crysllmgen_sample_perov_10000.pt

Important Notes

  • <DIFFUSION_CHECKPOINT_PATH> should point to:

    out/<Dataset>/<expt_date>/<expt_time>/
    
  • You can adjust --temperature and --top_p to balance diversity and generation quality.


Evaluation for Unconditional Generation

After sampling, evaluate the generated structures using:

For Perov-5

python -W ignore compute_metrics.py \
--root_path <PT_FILE_PATH> \
--tasks gen \
--eval_model_name perovskite \
--gt_file data/perov_5/test.csv

For MP-20

python -W ignore compute_metrics.py \
--root_path <PT_FILE_PATH> \
--tasks gen \
--eval_model_name mp20 \
--gt_file data/mp_20/test.csv

<PT_FILE_PATH> should be the directory containing:

  • Crysllmgen_sample_mp_10000.pt
  • Crysllmgen_sample_perov_10000.pt

Contact

For any questions, please contact: Kishalay Das kishalaydas@kgpian.iitkgp.ac.in


How to Cite

If you use CrysLLMGen or our textual dataset, please cite:

@article{khastagir2025llm,
  title={LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation},
  author={Khastagir, Subhojyoti and Das, Kishalay and Goyal, Pawan and Lee, Seung-Cheol and Bhattacharjee, Satadeep and Ganguly, Niloy},
  journal={arXiv preprint arXiv:2510.23040},
  year={2025}
}

About

CrysLLMGen - LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation. (NeurIPS 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors