The CompSpoof dataset is designed for studying component-level anti-spoofing, where either the speech or the environmental sound component (or both) may be spoofed.
We expanded CompSpoof dataset to CompSpoofV2, which significantly expands the diversity of attack sources, environmental sounds, and mixing strategies. ✨ In addition, newly generated audio samples are distributed across the test set and are specifically designed to serve as detection data under unseen conditions.
🤗 CompSpoofV2 Details & Download Link: https://xuepingzhang.github.io/CompSpoof-V2-Dataset/
Building upon CompSpoofV2 dataset and separation-enhanced joint learning framework, we lunched the ICME 2026 Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2). We warmly invite researchers from both academia and industry to participate in this challenge, exploring robust and effective solutions for these critical deepfake detection tasks.
🖥️ ESDD2 Challenge website: https://sites.google.com/view/esdd-challenge/esdd-challenges/esdd-2/description
You can download the dataset on hugging face:
🤗 CompSpoof Download Link
Below are audio samples from the CompSpoof dataset. For each class, we provide the mixed/original audio, along with the speech and environment sources.
Label: original
Description: Original bona fide speech and corresponding environment audio without mixing
| Original |
|---|
Label: bonafide_bonafide
Description: Bona fide speech mixed with another bona fide environmental audio
| Mixed | Speech | Environment |
|---|---|---|
Label: spoof_bonafide
Description: Spoof speech mixed with bona fide environmental audio
| Mixed | Speech | Environment |
|---|---|---|
Label: bonafide_spoof
Description: Bona fide speech mixed with spoof environmental audio
| Mixed | Speech | Environment |
|---|---|---|
Label: spoof_spoof
Description: Spoof speech mixed with spoof environmental audio
| Mixed | Speech | Environment |
|---|---|---|
| ID | Mixed | Speech | Environment | Class Label | Description |
|---|---|---|---|---|---|
| 0 | ❌ | Bona fide | Bona fide | original | Original bona fide speech and corresponding environment audio without mixing |
| 1 | ✅ | Bona fide | Bona fide | bonafide_bonafide | Bona fide speech mixed with another bona fide environmental audio |
| 2 | ✅ | Spoofed | Bona fide | spoof_bonafide | Spoof speech mixed with bona fide environmental audio |
| 3 | ✅ | Bona fide | Spoofed | bonafide_spoof | Bona fide speech mixed with spoof environmental audio |
| 4 | ✅ | Spoofed | Spoofed | spoof_spoof | Spoof speech mixed with spoof environmental audio |
The dataset includes three metadata files: CompSpoof_train.txt, CompSpoof_dev.txt, and CompSpoof_eval.txt.
Each line has four fields:
mixed_audio speech_source env_source class_label
Environmental sounds cover indoor, street, and natural settings, ensuring acoustic diversity.
During processing:
If you use this dataset in your research, please cite:
@misc{zhang2025compspoofdatasetjointlearning,
title={CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures},
author={Xueping Zhang and Liwei Jin and Yechen Wang and Linxi Li and Ming Li},
year={2025},
eprint={2509.15804},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2509.15804},
}
This dataset is a derived dataset constructed by combining and mixing audio samples from multiple publicly available datasets.
Users must comply with the license terms of each original dataset. The authors do not claim ownership of the original audio content. Due to the inclusion of datasets licensed under CC BY-NC 4.0 license, this dataset is released under the CC BY-NC 4.0 license.