⚖️ Curated Catalog · Legal

Caselaw Access Project

6.7M US court decisions spanning 360 years — fully digitized by Harvard Law.

LQS 84 · gold ✓ Commercial OK 6.7M court cases 420 GB JSON · XML Released 2018
Browse commercial Legal → Visit original source ↗
Source: static.case.law · maintained by Harvard Law School Library
6.7M
court cases
420 GB
Size on disk
84
LQS · gold
2018
First released

About this dataset

The Caselaw Access Project (CAP) is Harvard Law School's digitization of every US federal and state court opinion in the Harvard Law Library — 6.7M individual cases spanning 360 years. Since March 2024, CAP is fully open-access with no case volume limits. Includes OCR'd text, citations, parties, and court metadata.

Formats
JSON · XML

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

84
out of 100
gold tier

Solid dataset with some trade-offs

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 90
No public completeness metric; using prior for 'governmental' datasets.
Uniqueness 93
Exact-hash deduplication documented by maintainer.
Validation 70
Unlabeled corpus — validation limited to format integrity.
Size adequacy 99
6,700,000 cases — exceeds 5,000 adequacy target for Legal.
Format compliance 88
Custom format with published schema documentation.
Label density 0
Unlabeled corpus — label density not applicable.
Class balance 60
Unlabeled corpus — class balance not applicable.

What it's used for

Common tasks and benchmarks where Caselaw Access Project is the default or competitive choice.

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

6.7M cases from 1658-2020. Every federal + state appellate case in the Harvard Law Library. Full text, citations, party info, court metadata.

License

Caselaw Access Project is distributed under CC0 (post-2024 release). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Legal data?

LabelSets sellers offer paid legal datasets with what public datasets often can't give you:

Browse paid Legal → Sell your dataset

Similar public datasets

Other entries in the Legal catalog.

Frequently Asked Questions

Caselaw Access Project is distributed under CC0 (post-2024 release), which generally permits commercial use. Always verify the current license terms with the maintainer (Harvard Law School Library) before using in a commercial product.
Caselaw Access Project contains 6,700,000 court cases. 6.7M cases from 1658-2020. Every federal + state appellate case in the Harvard Law Library. Full text, citations, party info, court metadata.
Caselaw Access Project is maintained by Harvard Law School Library and is available at https://static.case.law/. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.
LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.