MultiAPI-Spoof is a multi-source audio anti-spoofing dataset that contains approximately 230 hours of spoofed audio. It includes synthetic audio generated by commercial TTS services, open-source models, and Chinese TTS websites. An equal amount of bonafide speech from CommonVoice is included for a 1:1 balance between genuine and spoofed samples. This dataset is designed to support research and model training for audio anti-spoofing.
Our new dataset, MultiAPI Spoof, contains speech samples generated from a variety of API sources, including:
The dataset is organized into 30 API, labeled A0–A29, with each group corresponding to a unique speech generation API source. The duration of speech in each API ranges from 0.2 to 12 hours.
| API NO. | train | dev | eval |
|---|---|---|---|
| A0-A20 | 70% | 10% | 20% |
| A21-A23 | / | 100% | / |
| A24-A29 | / | / | 100% |
The dataset includes three metadata files: MultiAPI_train.txt, MultiAPI_dev.txt, and MultiAPI_eval.txt.
Each line has four fields:
audio_path api class_label
XXX.mp3 A0 spoofed
XXX.mp3 - bonafide ---
If you use this code or dataset, please cite:
@misc{zhang2025multiapispoofmultiapidataset,
title={MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection},
author={Xueping Zhang and Zhenshan Zhang and Yechen Wang and Linxi Li and Liwei Jin and Ming Li},
year={2025},
eprint={2512.07352},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2512.07352},
}
For questions or suggestions, please contact: xueping.zhang@dukekunshan.edu.cn