1M+ speech utterances from 7K+ celebrities extracted from YouTube — the standard speaker verification benchmark.
Browse commercial Audio → Visit original source ↗VoxCeleb from Oxford Visual Geometry Group (VGG) is the canonical benchmark for speaker verification and identification at scale. VoxCeleb1 contains 153,516 utterances from 1,251 celebrities; VoxCeleb2 adds another 1,128,246 utterances from 6,112 celebrities — 2,000+ hours total. Audio is extracted from interview and talk-show YouTube clips using an automatic audio-visual pipeline, with speaker identity verified via face recognition. Widely used for training speaker embeddings (x-vectors, ECAPA-TDNN) and for ASR domain adaptation on celebrity speech.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where VoxCeleb is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
VoxCeleb is distributed under CC BY-SA 4.0 (VoxCeleb1) / Research (VoxCeleb2). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid audio datasets with what public datasets often can't give you:
Other entries in the Audio catalog.