Every dataset compliance-certified, jurisdiction-mapped, and scored on 14 general plus 3 legal-specific quality dimensions. Built for legal AI companies, BigLaw innovation teams, and enterprise legal departments.
The same dataset that looks great on a general-purpose leaderboard can silently fail in legal production. Three failure modes that your average quality score won't catch.
A corpus labeled "US case law" often turns out to be 78% Delaware and Federal Circuit with almost no state court coverage. Your model learns the wrong doctrines and misstates the law for real clients in every other jurisdiction.
Half of public legal corpora have unclear copyright provenance — scraped, redistributed, or built on top of data with ambiguous licensing. For enterprise buyers this isn't a quality issue, it's a compliance exposure.
About 1 in 6 public legal training datasets contain material from CaseHOLD, LegalBench, or LexGLUE test splits. Training on them silently inflates your eval metrics and breaks real-world generalization.
Every legal dataset is evaluated on three additional dimensions beyond the standard LabelSets Quality Score — designed for the failure modes that matter in legal-AI production.
Statistical fingerprinting of the jurisdictions actually present in the text, compared to the jurisdictions the seller claims. Reports a weighted entropy across the detected jurisdictions and flags datasets where the claimed scope and the real scope disagree.
Dedicated PII pipeline tuned for legal text. Flags personal identifiers, case numbers, attorney-client privileged phrases, and work product markers. Reports a confidence score with exact strings that require review before training.
Checks annotations against canonical clause and statute taxonomies (ISDA, ACC, CUAD, LexGLUE, EUR-Lex). Flags datasets with inconsistent or undocumented tagging — the single biggest predictor of poor generalization on contract-AI tasks.
445 annotated legal documents, 6K instruction-response pairs, full compliance certificate and jurisdiction coverage report.
Contract law, litigation filings, and regulatory analysis — annotated to a canonical clause taxonomy with jurisdiction tagging and full provenance documentation. Designed for teams building legal drafting, review, and research AI.