Dataset Category

NLP & Text Datasets for AI Training

Labeled text datasets for sentiment analysis, NER, classification, LLM fine-tuning, and more. PII-scanned, quality-verified, ready to train on.

Browse NLP Datasets → Sell Your Dataset

NLP Tasks Covered

From classic classification tasks to modern LLM instruction datasets, find exactly what your model needs.

💬

Sentiment Analysis

Positive/negative/neutral labeled text at sentence and aspect level. Reviews, social media, support tickets, and more.

🏷️

Named Entity Recognition

Token-level span annotations for people, organizations, locations, dates, and custom entity types.

🤖

LLM Fine-Tuning

Instruction-following, chat, and RLHF preference datasets formatted for GPT, LLaMA, Mistral, and Falcon.

Question Answering

Extractive and abstractive Q&A pairs with context passages. SQuAD-style and conversational formats.

📋

Text Classification

Single-label and multi-label text datasets for topic categorization, spam detection, and intent detection.

🌐

Translation & Summarization

Parallel corpora for machine translation and reference summaries for abstractive summarization training.

Frequently Asked Questions

Sentiment analysis, NER, text classification, Q&A pairs, summarization datasets, intent detection, dialogue, machine translation pairs, and LLM instruction fine-tuning datasets.
CSV, JSONL (newline-delimited JSON), Parquet, Arrow, and plain JSON. JSONL is the most common format and is natively supported by Hugging Face Datasets, pandas, and most fine-tuning frameworks.
Yes. Every dataset goes through automated PII scanning before publication. Datasets that pass display a "PII Scanned" badge. Sellers are required to remove or anonymize personal information before uploading.
Yes. Many sellers offer instruction-following, chat, and domain-specific datasets specifically formatted for fine-tuning open-weight models like LLaMA 3, Mistral, and Phi.
Upload your CSV or JSONL file, our pipeline validates the structure and scans for PII, you set a price, and buyers can purchase instantly. Sellers keep 85% of every sale with no listing fees.

Ready to build better NLP models?

Browse verified NLP datasets — or monetize your text data today. Looking for public alternatives? See our curated catalog of NLP datasets (SQuAD, The Pile, C4, MS MARCO, GLUE, Wikipedia dumps) with LQS scores.

Browse NLP Datasets → Sell Your Dataset