136M narrated video clips from 1.2M instructional YouTube videos.
Browse commercial Multimodal → Visit original source ↗HowTo100M from Inria is a large-scale instructional video dataset. 136M video-narration clips extracted from 1.22M YouTube instructional videos (cooking, DIY, fitness, etc.). Narrations come from automatic subtitles. Pairs of clip + narration are aligned at ~4s granularity.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where HowTo100M is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
HowTo100M is distributed under Apache 2.0 (metadata). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid multimodal datasets with what public datasets often can't give you:
Other entries in the Multimodal catalog.