Labeled text datasets for sentiment analysis, NER, classification, LLM fine-tuning, and more. PII-scanned, quality-verified, ready to train on.
Browse NLP Datasets → Sell Your DatasetFrom classic classification tasks to modern LLM instruction datasets, find exactly what your model needs.
Positive/negative/neutral labeled text at sentence and aspect level. Reviews, social media, support tickets, and more.
Token-level span annotations for people, organizations, locations, dates, and custom entity types.
Instruction-following, chat, and RLHF preference datasets formatted for GPT, LLaMA, Mistral, and Falcon.
Extractive and abstractive Q&A pairs with context passages. SQuAD-style and conversational formats.
Single-label and multi-label text datasets for topic categorization, spam detection, and intent detection.
Parallel corpora for machine translation and reference summaries for abstractive summarization training.
Browse verified NLP datasets — or monetize your text data today. Looking for public alternatives? See our curated catalog of NLP datasets (SQuAD, The Pile, C4, MS MARCO, GLUE, Wikipedia dumps) with LQS scores.
Browse NLP Datasets → Sell Your Dataset