🏥 Curated Catalog · Medical Imaging

CheXpert

224,316 chest X-rays from Stanford with automated + expert labels for 14 observations.

LQS 79 · gold ⚠ Research-only 224K X-ray images 440 GB JPG · CSV Released 2019

Browse commercial Medical Imaging → Visit original source ↗

Source: stanfordmlgroup.github.io · maintained by Stanford ML Group

About this dataset

CheXpert is Stanford ML Group's chest radiograph dataset. 224,316 chest X-rays of 65,240 patients with expert-labeled test set and rule-based training labels for 14 observations. The validation set has 8 expert radiologist labels per image for robust evaluation.

Maintainer

Stanford ML Group

License

Stanford Research Use License

Formats

JPG · CSV

Paper

Read on arxiv.org →

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

out of 100

gold tier

Solid dataset with some trade-offs

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 85

No public completeness metric; using prior for 'automated' datasets.

Uniqueness 93

Exact-hash deduplication documented by maintainer.

Validation 68

Labels auto-extracted from free-text reports — typical ~90% accuracy.

Size adequacy 93

224,316 images — exceeds 20,000 adequacy target for Medical Imaging.

Format compliance 95

Industry-standard format — drop-in compatible with mainstream tooling.

Label density 52

Average 1.0 labels per item (sparse).

Class balance 58

Long-tail distribution — dominant classes overrepresented.

What it's used for

Common tasks and benchmarks where CheXpert is the default or competitive choice.

Multi-label classification
Uncertainty-aware learning
Disease detection

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

224,316 images from 65,240 patients. 14 observations per image. Validation set has multi-radiologist ground truth (8 experts).

License

CheXpert is distributed under Stanford Research Use License. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Heads up: this dataset's license restricts commercial use. If you need medical imaging data for production, check LabelSets' paid datasets below — every listing has an explicit commercial license.

Need commercial-licensed Medical Imaging data?

LabelSets sellers offer paid medical imaging datasets with what public datasets often can't give you:

Explicit commercial license in writing
LQS-verified quality in your specific use-case
Instant download — no DUA, credentialed access, or research gating
PII scanned, deduplicated, and production-ready

Browse paid Medical Imaging → Sell your dataset

Frequently Asked Questions

CheXpert is distributed under Stanford Research Use License, which restricts commercial use. For a commercially-licensed alternative in medical imaging, see LabelSets' paid datasets.

CheXpert contains 224,316 X-ray images. 224,316 images from 65,240 patients. 14 observations per image. Validation set has multi-radiologist ground truth (8 experts).

CheXpert is maintained by Stanford ML Group and is available at https://stanfordmlgroup.github.io/competitions/chexpert/. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.

LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.

CheXpert

About this dataset

LabelSets Quality Score

Solid dataset with some trade-offs

What it's used for

Sample statistics

License

Need commercial-licensed Medical Imaging data?

Similar public datasets

Frequently Asked Questions