Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
Overview | News | Benchmark | Dataset | Evaluation | Leaderboard | Citation
S2S-Arena is the official repository for our ACL 2026 main conference paper:
S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
Recent speech-to-speech (S2S) systems are becoming increasingly natural spoken agents, but existing benchmarks still rely heavily on text-based evaluation. They often miss key paralinguistic cues such as prosody, emotion, speaking style, and speaker traits, which are essential for expressive and human-like communication.
S2S-Arena is a speech-native benchmark for evaluating instruction-following S2S models. It explicitly assesses both semantic understanding and paralinguistic expression through a multi-level interaction protocol and an arena-style pairwise evaluation framework directly in the speech modality.
This repository is under active development. We are currently preparing the public release of the dataset, automatic evaluation scripts, and a continuously updated leaderboard.
- 2026-05: S2S-Arena was accepted to the ACL 2026 main conference.
- Coming soon: Seed and Augment dataset release.
- Coming soon: Automatic evaluation toolkit.
- Coming soon: Live leaderboard for S2S model comparison.
For questions, feedback, or collaboration, please contact:
- Feng Jiang: jiangfeng@suat-sz.edu.cn
- Benyou Wang: wangbenyou@cuhk.edu.cn
Issues and pull requests are also welcome.
If you find S2S-Arena useful, please cite our paper. The final ACL proceedings citation will be updated once available.
@inproceedings{jiang2026s2sarena,
title = {S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models},
author = {Jiang, Feng and Lin, Zhiyu and Bu, Fan and Du, Yuhao and Wang, Benyou and Li, Haizhou},
booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
year = {2026}
}The license will be updated soon. Please contact the authors before using unreleased dataset or evaluation resources for commercial purposes.
We thank the open-source speech and language model communities for their work. S2S-Arena builds on the progress of speech-to-speech modeling and aims to support more reliable, expressive, and human-aligned spoken interaction.