Free · No credit card

Get a free quality score
for your dataset

Upload a sample CSV, JSON, JSONL, or ZIP and get a full LabelSets Quality Score (LQS) report in your inbox.

60 seconds·No account needed·PII-safe analysis

📂
Drop your dataset here, or click to browse
Max 20 MB — we analyze the first 5 MB for speed
CSV TSV JSON JSONL ZIP
📄

We never store your file contents. Unsubscribe any time.

Uploading…
Analyzing…

Ready to start selling?

List this dataset on LabelSets — buyers pay premium prices for quality data. Keep 85% of every sale.

List this dataset and start earning →

Free to list · Instant payouts · 85% revenue share

How it works
From upload to report in 60 seconds
1

Upload a sample

Drop any CSV, JSON, JSONL, or ZIP. We only need a representative sample — up to 20 MB.

2

We analyze it instantly

Our pipeline evaluates 14 dimensions across 5 pillars — structural integrity, annotation quality, statistical health, training fitness, and provenance — including real ML model runs.

3

Report in your inbox

A full LQS breakdown with actionable tips to improve your score before listing.

4

List and earn

Platinum-tier datasets sell faster and command premium prices. 3× more revenue on average.

14 dimensions across 5 tiers
Tier 1 — Structural Integrity · 35%

Completeness

Null/missing value rates per column; orphaned record detection.

12% of score
🔁

Uniqueness

Exact duplicate detection via hash comparison across all rows.

8% of score
🗂

Schema Validity

Type violations, range errors, and row-level schema drift detection.

8% of score
📄

Format Integrity

Spec compliance — YOLO/COCO/VOC/CSV/JSONL encoding and structure.

7% of score
Tier 2 — Annotation Quality · 30%
🎯

Label Accuracy

Malformed annotation rate — invalid coords, missing fields, parse failures.

10% of score
🏷

Label Density

Annotations per image / response length / label coverage per sample.

8% of score
📐

Annotation Consistency

Coefficient of variation of annotation density — catches mixed labeling standards.

7% of score

Class Distribution

Shannon entropy normalized to [0,1]; imbalance ratio; rare-class coverage.

5% of score
Tier 3 — Statistical Health · 20%
📈

Distribution Health

Bbox area spread, text length variance, null-rate distribution across features.

8% of score
🔇

Label Error Estimate

Composite estimate of label error rate from invalid annotations, duplicates, and missing labels.

7% of score

Signal Strength

Class separability proxy — class count × entropy × vocabulary diversity.

5% of score
Tier 4 — Training Fitness · 10%
📊

Size Adequacy

Sample count against task-specific minimums: 200 (audio) → 10K (tabular).

5% of score
🌐

Diversity Score

Type-token vocabulary ratio; class spread; semantic variety across samples.

5% of score
Tier 5 — Provenance · 5%
🔍

Provenance Quality

Description length, tag coverage, license type, and data source documentation.

5% of score
90–100
Platinum
75–89
Gold
60–74
Silver
0–59
Bronze

Composite = weighted average of all 14 dimensions. Scores computed from live file analysis — never self-reported. Full methodology →