⚖️ Curated Catalog · Legal

CUAD — Contract Understanding Atticus Dataset

510 commercial contracts with 13,101 expert-labeled clauses across 41 legal categories.

LQS 87 · gold ✓ Commercial OK 13.1K clause annotations 20 MB JSON · CSV Released 2021
Browse commercial Legal → Visit original source ↗
Source: github.com · maintained by The Atticus Project
13.1K
clause annotations
20 MB
Size on disk
87
LQS · gold
2021
First released

About this dataset

CUAD is The Atticus Project's legal contract review benchmark. 510 commercial contracts — M&A, licensing, supply, consulting, etc. — manually labeled by law students supervised by attorneys. 13,101 clause-level annotations across 41 legal categories (e.g., Governing Law, Change of Control, Non-Compete, IP Assignment). Widely used for training legal NLP models.

Maintainer
License
Formats
JSON · CSV

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

87
out of 100
gold tier

High-quality dataset across most dimensions

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 95
No public completeness metric; using prior for 'expert_curated' datasets.
Uniqueness 95
Manually vetted for uniqueness by maintainer.
Validation 90
Expert annotators with documented QC protocol.
Size adequacy 91
13,101 items — exceeds 5,000 adequacy target for Legal.
Format compliance 95
Industry-standard format — drop-in compatible with mainstream tooling.
Label density 52
Average 1.0 labels per item (sparse).
Class balance 75
Moderate class skew — realistic production distribution.

What it's used for

Common tasks and benchmarks where CUAD — Contract Understanding Atticus Dataset is the default or competitive choice.

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

510 contracts, 13,101 annotations, 41 categories, avg ~25 annotations per contract. 2:1 train/test split.

License

CUAD — Contract Understanding Atticus Dataset is distributed under CC BY 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Legal data?

LabelSets sellers offer paid legal datasets with what public datasets often can't give you:

Browse paid Legal → Sell your dataset

Similar public datasets

Other entries in the Legal catalog.

Frequently Asked Questions

CUAD — Contract Understanding Atticus Dataset is distributed under CC BY 4.0, which generally permits commercial use. Always verify the current license terms with the maintainer (The Atticus Project) before using in a commercial product.
CUAD — Contract Understanding Atticus Dataset contains 13,101 clause annotations. 510 contracts, 13,101 annotations, 41 categories, avg ~25 annotations per contract. 2:1 train/test split.
CUAD — Contract Understanding Atticus Dataset is maintained by The Atticus Project and is available at https://github.com/TheAtticusProject/cuad. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.
LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.