🔀 Curated Catalog · Multimodal

HowTo100M

136M narrated video clips from 1.2M instructional YouTube videos.

LQS 81 · gold ✓ Commercial OK 136M video clips 15 TB MP4 · CSV Released 2019

Browse commercial Multimodal → Visit original source ↗

Source: di.ens.fr · maintained by Inria WILLOW

About this dataset

HowTo100M from Inria is a large-scale instructional video dataset. 136M video-narration clips extracted from 1.22M YouTube instructional videos (cooking, DIY, fitness, etc.). Narrations come from automatic subtitles. Pairs of clip + narration are aligned at ~4s granularity.

Maintainer

Inria WILLOW

License

Apache 2.0 (metadata)

Formats

MP4 · CSV

Paper

Read on arxiv.org →

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

out of 100

gold tier

Solid dataset with some trade-offs

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 85

No public completeness metric; using prior for 'automated' datasets.

Uniqueness 93

Exact-hash deduplication documented by maintainer.

Validation 75

Labels generated by a trained model (e.g., automatic mask generation).

Size adequacy 99

136,000,000 clips — exceeds 100,000 adequacy target for Multimodal.

Format compliance 95

Industry-standard format — drop-in compatible with mainstream tooling.

Label density 52

Average 1.0 labels per item (sparse).

Class balance 58

Long-tail distribution — dominant classes overrepresented.

What it's used for

Common tasks and benchmarks where HowTo100M is the default or competitive choice.

Video-language pretraining
Action recognition
Step localization
Instructional content analysis

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

136M clips from 1.22M YouTube videos (23 categories). Avg 4s per clip with auto-generated narration. ~15K hours total.

License

HowTo100M is distributed under Apache 2.0 (metadata). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Multimodal data?

LabelSets sellers offer paid multimodal datasets with what public datasets often can't give you:

Explicit commercial license in writing
LQS-verified quality in your specific use-case
Instant download — no DUA, credentialed access, or research gating
PII scanned, deduplicated, and production-ready

Browse paid Multimodal → Sell your dataset

Frequently Asked Questions

HowTo100M is distributed under Apache 2.0 (metadata), which generally permits commercial use. Always verify the current license terms with the maintainer (Inria WILLOW) before using in a commercial product.

HowTo100M contains 136,000,000 video clips. 136M clips from 1.22M YouTube videos (23 categories). Avg 4s per clip with auto-generated narration. ~15K hours total.

HowTo100M is maintained by Inria WILLOW and is available at https://www.di.ens.fr/willow/research/howto100m/. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.

LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.

HowTo100M

About this dataset

LabelSets Quality Score

Solid dataset with some trade-offs

What it's used for

Sample statistics

License

Need commercial-licensed Multimodal data?

Similar public datasets

Frequently Asked Questions