Architecture — behind the substrate

The cognitive cascade

A request is resolved at the lowest tier whose grammar covers it. Higher tiers fire only when the lower tier cannot place the request. The shape that survives across implementations is a ladder:

tier	what it does	representative cost
`L0-closed`	Pure deterministic compute — arithmetic, closed-form physics, exact rollouts. Identical inputs produce identical outputs.	~10⁻⁹ J per op (single FMA range)
`L0-retrieved`	Exact citation lookup — named-fact tables, theorem dictionaries, anything keyed by a string and resolved by direct match.	~10⁻⁹ J per op
`L0-identifiable`	Identifiable encoder — HDC item-memory cleanup, perceptual hashes, anything that's replayable given encoder version + input.	~10⁻⁸ J per op (10 K-dim hypervector range)
`L1`	Citation-backed composition — applies retrieved values through closed-form rules. The answer cites its sources.	~10⁻⁹ J per op
`L1.5`	Matryoshka vector tier — short-circuit retrieval at low dimensionality, rerank at full. Lives between L1 and L2.	~10⁻⁸ J per op
`L2`	Model-generated. Not bit-replayable; carries epistemic mode + axes. Fires only when the spine cannot place the request.	~10⁻⁵ J per op (~10⁴× the spine)

The ratios in the right column are measured wall-time on commodity silicon by the mgai-meter binary; the absolute joules are bracketed by the Landauer floor (lower bound) and a TDP envelope (upper estimate). The point of the table isn't the absolute numbers — it's the spread. Three orders of magnitude between deterministic and model tiers is the whole design.

Live: /exhibits/reason runs a small math-domain cascade you can watch walk through the ladder, query by query.

V-class — typed at the grammar

Every claim the substrate emits carries an explicit replayability class:

L0-closed — deterministic compute; identical inputs → identical outputs.
L0-retrieved — exact citation; carries source_uri + retrieval_timestamp.
L0-identifiable — output of an identifiable encoder; replayable given encoder version + input.
L1 — citation-backed composition; carries citations to L0-retrieved sources.
L2 — model-generated; not bit-replayable; carries epistemic mode + axes.

The classes aren't documentation — they're a Rust type. The substrate's Claim<T> wrapper cannot be constructed without a ClaimAttribution that names the V-class, the source, the axes, and (for L2) the epistemic mode. Removing attribution at the audit boundary is a named operation, not an implicit conversion. Unmarked claims cannot escape into downstream code because the type system rejects them at the call site.

A model-generated paragraph cannot be relabelled as L0-closed by formatting; the relabel is a type error. The V-class is structural, not stylistic.

The routing seam — logged oracle

The orchestrator picks a tier for each query based on live measurements (latency, jitter, cache hit-rate, load). Those measurements drift; the verdict drifts with them. That's bad for replay.

The fix is the seam: when a request shape resolves for the first time, the substrate records the verdict as a constant in an append-only log. Later replays read the verdict from the log, never from live measurements. The result is the logged-oracle pattern — the nondeterministic step happens once, at record time, and is a constant in the log forever after.

The seam exposes three reads: a live convergence rate (repeats / observations), the FNV-1a digest of the command log, and a replay(query) path that bypasses the orchestrator entirely for a previously-seen shape. A cascade that promotes the seam from observational to authoritative routing becomes load-cheap — most queries skip the orchestrator and resolve through the recorded coordinate.

The transcript digest — trust as comparison

Every successful execute appends one entry to a transcript: (shape_key, horizon_code, response_hash). The transcript carries a rolling FNV-1a-64 digest. Two replicas of the cascade serving the same command stream land on identical digests. A replica that forges, drops, reorders, or mutates a single entry diverges by construction.

This makes trust a comparison, not a separate protocol. You don't need a Byzantine-fault-tolerant consensus layer to detect tampering — you just compare digests. The digest is computed deterministically over the same byte stream on every honest replica; the only way to disagree is to be dishonest, and dishonesty is observable.

The seam's command log carries its own digest with the same property, so the routing verdicts are byte-reproducible across replicas too. The pair (state digest, transcript digest) is the cheap-trust primitive the substrate commits to.

Four pieces, picojoule cost

The substrate lives today in four pieces. Each one stands on its own; each one has an exhibit on this site that runs the kernel and emits its receipt in your browser:

Chain of thought — deterministic. The cascade walks each query through the ladder; the trace IS the reasoning; every hop carries its own receipt. Zero new compute. The transcript is the chain of thought.
Omnimodal routing — one shape-key space. Four deterministic encoders (text, image, audio, video) project into a shared 10 000-dim hypervector space. Cross-modal cosine is meaningful because the algebra is shared, not because a model was trained.
Reasoning + generation split per tier. Every cascade tier is split into a reasoning sub-expert (classify the form) and a generation sub-expert (produce the answer). Generation only fires when reasoning matches. Both sub-hops are individually accountable.
Lazy model tier — separate WASM bundle. The spine pays picojoules; the model tier loads only on escalation. Two bundles visible in the browser's Network panel: the spine on first paint, the leaf only when the spine could not place a request.

Receipts — the substrate's only non-negotiable

Every metered call returns its joules — computed at the call site against a silicon-specific cost model, the Landauer floor, and a TDP envelope. The receipt is structural, not an afterthought. Three properties the substrate commits to:

Honest method labels. Every receipt declares how the joules were estimated — measured when a live counter (RAPL / powercap / NVML / powermetrics) was read, tdp_estimate + landauer_floor when only wall-clock was available. No silent substitution between methods.
μ is always reported. The apparent impedance E_tdp / E_floor is the gap between real silicon and physics' lower bound. A receipt that hides μ is a substrate bug.
No fabrication. If a claim cannot be tied to a retrievable source or a deterministic computation, it is published as L2 — never as L0 or L1. A claim that cannot be authored at the right V-class is not authored.

The thermometer at /scale shows where your last measured frame sits on a log axis between the Landauer floor and a kilometre in an EV. The session aggregator at /receipts sums per-exhibit totals on this device. Nothing is uploaded.

What's invariant, what's dated

The seven invariants in the substrate charter are commitments — energy as the unit of account, cascade structure (cheap-deterministic before expensive-stochastic), V-class replayability, grammar-level attribution, first-class receipts, method honesty, and hard refusal to emit unmarked synthetic claims. Anything else is dated — current implementation choices that may change without breaking the substrate's identity.

What that means in practice: the cascade tier representatives (which specific Rust function measures L0-closed) are dated; the cascade structure (cheap-deterministic before expensive-stochastic) is invariant. The HDC dimension (10 000 today) is dated; the principle (identifiable encoders project into a shared space) is invariant. The L2-leaf bundle size (41 KB stub today, ~280 MB 1-bit-class target) is dated; the lazy-load discipline (model tier only loads on escalation) is invariant.

The invariants survive future implementation rewrites. The dated sections are expected to be edited as the substrate matures.

Behind the substrate.