ALETHOR BUILDS ANCHOR

Know which model results are real.

Anchor is Alethor’s product for proving model results. When someone says a model is better, Anchor reruns the benchmark, checks the evidence, and shows whether the result is trustworthy enough to become your team’s baseline.

invite-only private alphabaseline comparisonsigned evidence

Verification detail

Verification state, baseline status, and evidence stay on one record.

Verdict readyBaseline eligibleEvidence bundled

The decision surface keeps the benchmark claim, pinned protocol fields, compare status, and evidence bundle close to the run.

What Anchor Does

Anchor keeps benchmark decisions readable.

Verify a claimed benchmark result, accept the passing run as the official baseline, compare later candidates against the same protocol, and keep signed evidence attached to the decision.

Current private-alpha scope

verify claimed resultpin official baselinecompare later candidateexport signed evidence

Recompute the claim

Anchor reruns the claimed benchmark path and checks the result against pinned protocol fields.

Accept one baseline

A passing canonical run becomes the official baseline for later candidate comparison.

Carry signed evidence

The verdict, artifacts, manifest, and signature stay attached to the acceptance decision.

Workflow

A fixed path from claim to signed evidence.

Each step stays anchored to one run, one benchmark path, and one accepted baseline.

1

Verify

Recompute the claimed benchmark result on the fixed protocol path.

2

Inspect evidence report

Read the verdict, checks, artifacts, and pinned protocol fields.

3

Pin baseline

Accept a passing canonical run as the official baseline.

4

Compare candidate

Evaluate a later run against the same pinned baseline protocol.

5

Export signed evidence

Package the bundle, archive the record, and export signed evidence.

Proof

Real product states keep the decision legible.

Verification verdict, baseline readiness, compare validity, and package outputs stay attached to the same run record.

Private Alpha

The current private-alpha contract is deliberately fixed.

Anchor currently runs on one benchmark path with pinned protocol fields for verification, baseline acceptance, compare, and signed evidence export.

Invite-only private alpha. Current scope: verify a claimed result, pin the baseline, compare later candidates, and export signed evidence.

benchmark_id
mmlu_pro
task
leaderboard_mmlu_pro
few-shot
5-shot
metric
acc
canonical batch_size
1

Read A Run

Four statuses explain the current state.

These terms orient a run quickly without turning the product into a general dashboard.

Job status

queued / running / succeeded

Where the verification run is in execution.

Verification verdict

pass / fail

Whether the recomputed benchmark result passed the checks.

Baseline eligibility

eligible when canonical

Whether the run can be accepted as the official baseline.

Package status

ready / packaged

Whether the signed evidence bundle is ready or already exported.

Why Teams Use Anchor

Trust improves when the evidence stays attached.

Anchor keeps the narrow acceptance path readable instead of scattering it across trackers and dashboards.

Explicit checks

Verification steps stay named, visible, and tied to the verdict.

Reproducible record

Artifacts, manifests, and replay inputs stay attached to the accepted run.

Stable comparison

Later candidates are compared against one pinned baseline protocol.

Private Alpha

Request private alpha.

Anchor is currently available as an invite-only private alpha for teams that need to verify model claims before they set a baseline.