Upload a sample CSV, JSON, JSONL, or ZIP and get a full LabelSets Quality Score (LQS) report in your inbox.
List this dataset on LabelSets — buyers pay premium prices for quality data. Keep 85% of every sale.
List this dataset and start earning →Drop any CSV, JSON, JSONL, or ZIP. We only need a representative sample — up to 20 MB.
Our pipeline evaluates 14 dimensions across 5 pillars — structural integrity, annotation quality, statistical health, training fitness, and provenance — including real ML model runs.
A full LQS breakdown with actionable tips to improve your score before listing.
Platinum-tier datasets sell faster and command premium prices. 3× more revenue on average.
Null/missing value rates per column; orphaned record detection.
Exact duplicate detection via hash comparison across all rows.
Type violations, range errors, and row-level schema drift detection.
Spec compliance — YOLO/COCO/VOC/CSV/JSONL encoding and structure.
Malformed annotation rate — invalid coords, missing fields, parse failures.
Annotations per image / response length / label coverage per sample.
Coefficient of variation of annotation density — catches mixed labeling standards.
Shannon entropy normalized to [0,1]; imbalance ratio; rare-class coverage.
Bbox area spread, text length variance, null-rate distribution across features.
Composite estimate of label error rate from invalid annotations, duplicates, and missing labels.
Class separability proxy — class count × entropy × vocabulary diversity.
Sample count against task-specific minimums: 200 (audio) → 10K (tabular).
Type-token vocabulary ratio; class spread; semantic variety across samples.
Description length, tag coverage, license type, and data source documentation.
Composite = weighted average of all 14 dimensions. Scores computed from live file analysis — never self-reported. Full methodology →