SAGE: Global Semantic Alignment with LLMs for Long-Tail Sequential Recommendation

This repository contains the official PyTorch implementation of SAGE (Semantic Alignment with Global Embedding). SAGE is a framework designed to enhance Sequential Recommendation Systems (SRS) by alleviating the long-tail problem through global semantic alignment using Large Language Models (LLMs).

🌟 Overview

SAGE integrates LLM-derived semantic information with collaborative signals. Key components include:

Fuzzy-Membership Prototypes: Uses HDBSCAN and UMAP to cluster items and assigns fuzzy memberships, allowing tail items to inherit features from semantically related head items.
Alignment-Based User Distillation: Retrieves semantically similar users to enrich sparse user representations.
Dual-View Modeling: Aligns Semantic (LLM) and Collaborative (ID-based) spaces.

🚀 Data Preparation

Before training, you must generate LLM embeddings and perform the fuzzy clustering process.

1. Data Processing

Process the raw datasets (Beauty, Fashion, Yelp) into interaction sequences.

python data/data_process.py

2. Generate LLM Embeddings

Use the notebooks in data/{dataset}/ to generate semantic embeddings for items and users (e.g., using OpenAI API or open-source LLMs).

get_item_embedding.ipynb
get_user_embedding.ipynb
pca.ipynb (Reduce embedding dimension to 64 for efficiency).

3. Clustering & Fuzzy Membership (Crucial)

Run the notebooks in the Clustering/ folder (e.g., Clustering/fashion.ipynb). This step utilizes UMAP for reduction and HDBSCAN for density-based clustering to generate the following required artifacts in data/{dataset}/handled/:

hdbscan_best_labels.pkl: Cluster assignments.
hdbscan_cluster_centers.pkl: Weighted semantic centers.
hdbscan_core_probs.pkl: Core point probabilities.
hdbscan_fuzzy_U.pkl: The Global Fuzzy Membership Matrix.

Note: The clustering logic ensures noise points (tail items) are assigned soft memberships based on distance to valid semantic clusters.

🏃 Training & Evaluation

Scripts to reproduce the experiments are located in experiments_sage/. We support three backbones: SASRec, Bert4Rec, and GRU4Rec.

Running a specific dataset

cd experiments_sage
bash beauty.bash
bash yelp.bash
bash fashion.bash

Key Hyperparameters

Argument	Description	Default
`--model_name`	Backbone selection (`llmesr_sasrec`, `llmesr_bert4rec`, `llmesr_gru4rec`)	`llmesr_sasrec`
`--alpha`	Weight for User Alignment (Distillation) Loss	`0.1`
`--gamma`	Weight for Item Prototype (Fuzzy) Loss	`0.05`
`--beta`	Weight for Global Semantic-Collaborative Alignment	`0.1`
`--fuzzy_m`	Fuzzy exponent (controls fuzziness of membership)	`2`
`--ts_user`	Threshold to define tail users (interactions < threshold)	`9`
`--ts_item`	Threshold to define tail items	`4`
`--user_sim_func`	Similarity function for user retrieval (`kd` for distillation)	`kd`
`--freeze`	Freeze the pre-computed LLM Semantic Embeddings	`True`

📊 Results

The model saves logs to the logs/ directory. The primary metrics evaluated are Hit Rate@10 (HR@10) and NDCG@10.

Reference

If you use this extension, please cite the original LLM-ESR work along with our paper (to be added).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Clustering		Clustering
data		data
generators		generators
models		models
trainers		trainers
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAGE: Global Semantic Alignment with LLMs for Long-Tail Sequential Recommendation

🌟 Overview

🚀 Data Preparation

1. Data Processing

2. Generate LLM Embeddings

3. Clustering & Fuzzy Membership (Crucial)

🏃 Training & Evaluation

Running a specific dataset

Key Hyperparameters

📊 Results

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAGE: Global Semantic Alignment with LLMs for Long-Tail Sequential Recommendation

🌟 Overview

🚀 Data Preparation

1. Data Processing

2. Generate LLM Embeddings

3. Clustering & Fuzzy Membership (Crucial)

🏃 Training & Evaluation

Running a specific dataset

Key Hyperparameters

📊 Results

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages