How LabelSets works

Upload data. Get a signed quality cert. Defend the model.

Three stages, fully automated, end-to-end in under 10 minutes. Your dataset goes in raw — comes out scored across 19 dimensions, signed with our Ed25519 production key, and ready to cite in your SR 11-7, EU AI Act Art. 10, or §1557 paperwork. Whether you list on the marketplace or score privately, the cert is the deliverable.

Step 1
Ingest
JSONL, CSV, Parquet, or HF dataset. Schema inferred + validated.
Step 2
LQS Scorer
7 oracles run in parallel. 19 dimensions scored with 95% CI. Contamination scan.
Step 3
Signed cert
Ed25519 signature over canonical JSON. Pushed to public revocation registry.
Ready · 7 oracles standing by
Inside each stage

What actually runs when you hit upload.

01 · Ingest
Parse + validate
Format-detect (JSONL, CSV, Parquet, HF). Schema inferred from first 1k rows, validated across the rest. Malformed rows quarantined, not silently dropped.
typical runtime · < 30s
failure mode · schema report + row offsets
02 · Score
7 oracles + 19 dims
Parallel oracle execution (schema, label-consistency, MiniLM, BGE, drift, holdout, adversarial). Fleiss κ + per-oracle variance tracked. Contamination scanned against 40+ public evals.
typical runtime · 2-8 min
per-dim CI · Wilson / fold / bootstrap
03 · Sign
Ed25519 + registry
Canonical-JSON serialization (sorted keys, UTF-8). Ed25519 signature over the payload. Cert ID pushed to public revocation registry. Buyer verification endpoint live.
signing time · < 5ms
verify endpoint · /api/verify-lqs-cert
Two audiences

Whether you're buying or selling, the cert is the artifact.

For sellers

Turn your curated dataset into a marketplace listing in ~10 minutes.
  • 01
    Upload via web or labelsets upload CLI. Any columnar format.
  • 02
    Score runs automatically. Real-time progress per oracle. Typical job: 2-8 minutes.
  • 03
    Review the cert preview. Decide your price, license terms, listing copy.
  • 04
    Publish. Listing goes live with the signed cert attached. 85% revenue share.
  • 05
    Payouts weekly via Stripe. Per-license audit trail in your dashboard.

For buyers

Procure, verify, and cite — with the same artifact your risk team will file.
  • 01
    Browse or search the signed catalog. Filter by LQS, contamination, compliance, HIPAA.
  • 02
    Preview the cert before purchase. Every dim + CI visible.
  • 03
    Purchase. Commercial license, perpetual use. Stripe checkout.
  • 04
    Verify offline with our public key — no network call, no LabelSets in your trust chain.
  • 05
    Cite in procurement paperwork: SR 11-7, §1557, EU AI Act Art. 10, FDA 21 CFR 11.
85 / 15 revenue share · what a sale actually looks like
Example dataset price
Seller receives
$679.15
Direct Stripe transfer · weekly · 7-day chargeback hold
Platform fee
$119.85
Covers scoring, hosting, verification, revocation infra
Buyer pays
$799.00
One-time · perpetual commercial license
FAQ

Questions your risk team will ask.

What if a cert gets revoked after I've purchased?

Revocation is public. If contamination or a labeling issue surfaces post-sale, the cert ID is added to /api/lqs-revocations.json. Your CI pipeline (GitHub Action or SDK) catches it automatically on the next build. You also keep the raw data you paid for — revocation is a quality signal, not a license termination.

Is the Ed25519 key rotated?

Yes, on a 24-month cadence. Previous keys remain valid for signatures issued during their validity window. The registry publishes a key history with validity intervals. Offline verification uses the fingerprint embedded in the cert itself.

How do you prevent a seller from gaming the score?

Oracles are independent and some are not public. The holdout classifier runs on a never-revealed test split. Adversarial stability injects perturbations. The cert records oracle agreement (Fleiss κ) — if oracles disagree, the score is flagged brittle, not inflated. Seller can't tune against oracles they can't see.

Do I need LabelSets to be online to verify?

No. Verification is offline-first. You fetch the public key once (/api/lqs-public-key), cache it, and verify any cert against it with Ed25519 — the SDK does this in < 5ms with no network call. Only revocation checking requires a network hit, and that endpoint is cacheable.

What formats are supported for upload?

JSONL, CSV, Parquet, Apache Arrow, Hugging Face datasets, and direct S3/GCS URLs. For unstructured data (images, audio), we support manifest + asset-bundle uploads with SHA-256 integrity per asset.

Can I run the scorer on my private data without publishing?

Yes — the enterprise tier includes a private-mode scorer. Your data never leaves your VPC; we ship the scorer as a Docker container. You get the same signed cert, same public-key verifiability, and can cite it internally without listing publicly.

Ready to see a cert verify live?

Paste any LabelSets cert hash into the verifier. Watch the Ed25519 signature verify in milliseconds against our public key.