Home·Curated Catalog·Medical Imaging
🏥 Curated Catalog · Medical Imaging

NIH ChestX-ray14

112,120 frontal chest X-rays from 30,805 patients with 14 disease labels.

LQS 80 · gold ✓ Commercial OK 112K X-ray images 45 GB PNG · CSV Released 2017
Browse commercial Medical Imaging → Visit original source ↗
Source: nihcc.app.box.com · maintained by NIH Clinical Center
112K
X-ray images
45 GB
Size on disk
80
LQS · gold
2017
First released

About this dataset

NIH ChestX-ray14 is one of the largest publicly-available chest X-ray datasets. 112,120 frontal-view X-ray images from 30,805 patients at the NIH Clinical Center, automatically labeled with 14 thoracic disease categories (Atelectasis, Cardiomegaly, Effusion, etc.) extracted from radiology reports via NLP.

Maintainer
License
Formats
PNG · CSV

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

80
out of 100
gold tier

Solid dataset with some trade-offs

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 93
Published by maintainer: 93% completeness across annotated fields.
Uniqueness 93
Exact-hash deduplication documented by maintainer.
Validation 68
Labels auto-extracted from free-text reports — typical ~90% accuracy.
Size adequacy 92
112,120 images — exceeds 20,000 adequacy target for Medical Imaging.
Format compliance 95
Industry-standard format — drop-in compatible with mainstream tooling.
Label density 52
Average 1.0 labels per item (sparse).
Class balance 58
Long-tail distribution — dominant classes overrepresented.

What it's used for

Common tasks and benchmarks where NIH ChestX-ray14 is the default or competitive choice.

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

112,120 images from 30,805 unique patients. 14 disease labels. Labels extracted via NLP from radiology reports (noisy — ~90% accuracy).

License

NIH ChestX-ray14 is distributed under CC0 1.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Medical Imaging data?

LabelSets sellers offer paid medical imaging datasets with what public datasets often can't give you:

Browse paid Medical Imaging → Sell your dataset

Similar public datasets

Other entries in the Medical Imaging catalog.

Frequently Asked Questions

NIH ChestX-ray14 is distributed under CC0 1.0, which generally permits commercial use. Always verify the current license terms with the maintainer (NIH Clinical Center) before using in a commercial product.
NIH ChestX-ray14 contains 112,120 X-ray images. 112,120 images from 30,805 unique patients. 14 disease labels. Labels extracted via NLP from radiology reports (noisy — ~90% accuracy).
NIH ChestX-ray14 is maintained by NIH Clinical Center and is available at https://nihcc.app.box.com/v/ChestXray-NIHCC. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.
LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.