The CompSpoof dataset is designed for studying component-level anti-spoofing, where either the speech or the environmental sound component (or both) may be spoofed.
You can download the dataset on hugging face:
🤗 CompSpoof Download Link
Below are audio samples from the CompSpoof dataset. For each class, we provide the mixed/original audio, along with the speech and environment sources.
Label: original
Description: Original bona fide speech and corresponding environment audio without mixing
Original |
---|
Label: bonafide_bonafide
Description: Bona fide speech mixed with another bona fide environmental audio
Mixed | Speech | Environment |
---|---|---|
Label: spoof_bonafide
Description: Spoof speech mixed with bona fide environmental audio
Mixed | Speech | Environment |
---|---|---|
Label: bonafide_spoof
Description: Bona fide speech mixed with spoof environmental audio
Mixed | Speech | Environment |
---|---|---|
Label: spoof_spoof
Description: Spoof speech mixed with spoof environmental audio
Mixed | Speech | Environment |
---|---|---|
ID | Mixed | Speech | Environment | Class Label | Description |
---|---|---|---|---|---|
0 | ❌ | Bona fide | Bona fide | original | Original bona fide speech and corresponding environment audio without mixing |
1 | ✅ | Bona fide | Bona fide | bonafide_bonafide | Bona fide speech mixed with another bona fide environmental audio |
2 | ✅ | Spoofed | Bona fide | spoof_bonafide | Spoof speech mixed with bona fide environmental audio |
3 | ✅ | Bona fide | Spoofed | bonafide_spoof | Bona fide speech mixed with spoof environmental audio |
4 | ✅ | Spoofed | Spoofed | spoof_spoof | Spoof speech mixed with spoof environmental audio |
The dataset includes three metadata files: CompSpoof_train.txt
, CompSpoof_dev.txt
, and CompSpoof_eval.txt
.
Each line has four fields:
mixed_audio speech_source env_source class_label
Environmental sounds cover indoor, street, and natural settings, ensuring acoustic diversity.
During processing:
If you use this dataset in your research, please cite:
@misc{zhang2025compspoofdatasetjointlearning,
title={CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures},
author={Xueping Zhang and Liwei Jin and Yechen Wang and Linxi Li and Ming Li},
year={2025},
eprint={2509.15804},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2509.15804},
}