8.8M passages and 1M anonymized Bing queries — the go-to benchmark for passage ranking.
Browse commercial NLP / Text → Visit original source ↗MS MARCO (Microsoft MAchine Reading COmprehension) is a collection of real anonymized Bing user queries paired with relevant passages. The passage ranking task has 8.8M passages and 1M queries; there are also QA, generation, and document ranking variants. Widely used for training dense retrievers and rerankers.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where MS MARCO is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
MS MARCO is distributed under MS MARCO License (research only). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.