Omnimodal — four modalities, one shape-key space

Text, image, audio, and video each ride a deterministic encoder — trigram bag, 8×8 cell bind, FFT-band bag, and per-frame image-bind plus temporal permute-and-bundle for video. All four produce a 10 000-dim bipolar HDC hypervector in the same space. The 4×4 matrix at the bottom is the cosine between every pair of encoder outputs; it is recomputed every frame and is the proof that the cross-modal projection works as a substrate property — no joint training, no shared encoder. The exhibit rotates through three scenes (circle / square / diagonal); each scene supplies one sample per modality.

view the kernel that wrote this receipt crates/mathground-view/src/modal.rs
//! Modality routers: four deterministic encoders that all project into
//! the same HDC shape-key space.
//!
//! - **Text**: trigram bag → bundle of seeded random hypervectors.
//! - **Image**: 8×8 intensity grid → cell-level (level ⊗ position) bind,
//!              then bundle.
//! - **Audio**: log-magnitude FFT spectrum → spectrum-band bag (bind
//!              band-id with magnitude level, bundle bands).
//! - **Video**: keyframes → per-frame image-encoding → temporal
//!              permute-and-bundle (Plate-style sequence binding).
//!
//! All four produce an `Hv` of dimension `HDC_DIM`. Cross-modal cosine
//! similarity is then a meaningful number: an image of a square and the
//! text "square" produce hypervectors whose cosine is much higher than
//! a random pair, because both projects share atom hypervectors in the
//! same algebraic space.
//!
//! The atom-hypervector seeds are derived from BLAKE3-truncated keys,
//! so the same atom (e.g. "circle", "low-frequency", grid-cell (3,4))
//! always seeds the same hypervector — making the encoders deterministic
//! and the cross-modal alignment a substrate property, not a learned one.

use blake3;

use crate::fft::fft_in_place;
use crate::hdc::{bitops_bind, bitops_bundle, bitops_permute, Hv, HDC_DIM};

/// Stable seed for an atom identified by a string key.
fn seed(key: &str) -> u64 {
    let h = blake3::hash(key.as_bytes());
    let b = h.as_bytes();
    u64::from_le_bytes([b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7]])
}

/// Per-encoding report: the resulting hypervector + the bit-ops cost.
pub struct ModalReport {
    pub hv: Hv,
    pub bit_ops: u64,
}

// ── Text ──────────────────────────────────────────────────────────────

/// Trigram-bag encoding. Each trigram seeds a hypervector; all are
/// bundled. Cheap, order-insensitive (deliberately — the order story
/// is on the temporal side for video).
pub fn encode_text(text: &str) -> ModalReport {
    let lower: String = text.to_lowercase();
    // Lower-case, whitespace-collapse, then walk trigrams over characters.
    let normalized: String = lower
        .split_whitespace()
        .collect::<Vec<_>>()
        .join(" ");
    let chars: Vec<char> = normalized.chars().collect();
    let mut hv = Hv::zero(HDC_DIM);
    let mut atoms: Vec<Hv> = Vec::with_capacity(chars.len().saturating_sub(2));
    for w in chars.windows(3) {
        let key = format!("t:{}{}{}", w[0], w[1], w[2]);
        atoms.push(Hv::random(seed(&key), HDC_DIM));
    }
    let refs: Vec<&Hv> = atoms.iter().collect();
    hv.bundle_into(&refs);
    let bit_ops = bitops_bundle(HDC_DIM, atoms.len().max(1));
    ModalReport { hv, bit_ops }
}

// ── Image ─────────────────────────────────────────────────────────────

/// 8×8 intensity-grid encoding. Each (row, col) position is bound with
/// a quantised-intensity atom; bundle across all cells.
pub fn encode_image(grid: &[u8]) -> ModalReport {
    assert_eq!(grid.len(), 64, "encode_image expects an 8x8 = 64-cell grid");
    let mut hv = Hv::zero(HDC_DIM);
    let mut atoms: Vec<Hv> = Vec::with_capacity(64);
    for (i, &v) in grid.iter().enumerate() {
        let r = i / 8;
        let c = i % 8;
        let level = v / 32; // 8 intensity bins
        let pos = Hv::random(seed(&format!("img-pos:{r}:{c}")), HDC_DIM);
        let lvl = Hv::random(seed(&format!("img-lvl:{level}")), HDC_DIM);
        atoms.push(pos.bind(&lvl));
    }
    let refs: Vec<&Hv> = atoms.iter().collect();
    hv.bundle_into(&refs);
    let bit_ops = (64u64 * bitops_bind(HDC_DIM)) + bitops_bundle(HDC_DIM, 64);
    ModalReport { hv, bit_ops }
}

// ── Audio ─────────────────────────────────────────────────────────────

/// Spectrum-bag encoding. Run an FFT on the input, log-magnitude into
/// 16 frequency bands, then bind each band's id with a magnitude-level
/// atom and bundle.
pub fn encode_audio(samples: &[f32]) -> ModalReport {
    let n = samples.len().next_power_of_two();
    let mut buf = vec![0.0f32; 2 * n];
    for (k, s) in samples.iter().enumerate().take(n) {
        buf[2 * k] = *s;
        buf[2 * k + 1] = 0.0;
    }
    // Bit-ops from the FFT (returned by fft_in_place).
    let fft_report = fft_in_place(&mut buf).expect("power-of-two");
    // Reduce to 16 band magnitudes (log + quantise).
    let half = n / 2;
    let band_count = 16usize.min(half);
    let bin_per_band = (half + band_count - 1) / band_count;
    let mut bands = vec![0.0f32; band_count];
    for k in 0..half {
        let re = buf[2 * k];
        let im = buf[2 * k + 1];
        let mag = (re * re + im * im).sqrt();
        let b = (k / bin_per_band).min(band_count - 1);
        if mag > bands[b] { bands[b] = mag; }
    }
    let max_mag = bands.iter().cloned().fold(1e-6f32, f32::max);
    let mut atoms: Vec<Hv> = Vec::with_capacity(band_count);
    // Only bundle bands above a noise-floor threshold. A flat spectrum
    // is encoded by the absence of strong bands, not by bundling 16
    // near-silent "this is quiet" atoms — which would collapse different
    // signals onto the same hypervector (most of the bundle would be
    // identical "quiet" atoms).
    for (b, &m) in bands.iter().enumerate() {
        let level_db = (20.0 * (m / max_mag).max(1e-3).log10()).clamp(-60.0, 0.0);
        if level_db < -20.0 {
            continue;
        }
        let level_bin = ((-level_db / 20.0) * 4.0).floor().clamp(0.0, 3.0) as u32;
        let band = Hv::random(seed(&format!("aud-band:{b}")), HDC_DIM);
        let lvl = Hv::random(seed(&format!("aud-lvl:{level_bin}")), HDC_DIM);
        atoms.push(band.bind(&lvl));
    }
    if atoms.is_empty() {
        // Silent input — return a stable "silence" hypervector so cosine is
        // defined but doesn't collide with any real signal.
        atoms.push(Hv::random(seed("aud-silence"), HDC_DIM));
    }
    let mut hv = Hv::zero(HDC_DIM);
    let refs: Vec<&Hv> = atoms.iter().collect();
    hv.bundle_into(&refs);
    let bit_ops = fft_report.bit_ops
        + (atoms.len() as u64 * bitops_bind(HDC_DIM))
        + bitops_bundle(HDC_DIM, atoms.len().max(1));
    ModalReport { hv, bit_ops }
}

// ── Video ─────────────────────────────────────────────────────────────

/// Keyframe sequence → per-frame image-encoding → temporal permute-and-
/// bundle (Plate / Kanerva sequence-encoding). Returns one hypervector
/// for the whole clip.
///
/// `keyframes` is a flat array of 8×8 grids (`keyframes.len() == 64 *
/// frame_count`). Frames are processed in order; frame `t` is permuted
/// by `t` positions before bundling, so videos with the same frames in
/// different order produce different hypervectors.
pub fn encode_video(keyframes: &[u8], frame_count: usize) -> ModalReport {
    assert_eq!(keyframes.len(), 64 * frame_count, "expects 8x8 grids × frames");
    let mut per_frame: Vec<Hv> = Vec::with_capacity(frame_count);
    let mut per_frame_cost: u64 = 0;
    for t in 0..frame_count {
        let grid = &keyframes[t * 64..(t + 1) * 64];
        let f = encode_image(grid);
        per_frame_cost += f.bit_ops;
        per_frame.push(f.hv.permute(t as isize));
    }
    let mut hv = Hv::zero(HDC_DIM);
    let refs: Vec<&Hv> = per_frame.iter().collect();
    hv.bundle_into(&refs);
    let bit_ops = per_frame_cost
        + (frame_count as u64 * bitops_permute(HDC_DIM))
        + bitops_bundle(HDC_DIM, frame_count);
    ModalReport { hv, bit_ops }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn text_is_deterministic() {
        let a = encode_text("circle of radius one");
        let b = encode_text("circle of radius one");
        assert_eq!(a.hv.0, b.hv.0);
    }

    #[test]
    fn text_similar_phrases_have_higher_cosine_than_unrelated() {
        let a = encode_text("circle radius one");
        let b = encode_text("circle radius two");
        let c = encode_text("the lazy dog ate breakfast");
        let near = a.hv.cosine(&b.hv);
        let far = a.hv.cosine(&c.hv);
        assert!(
            near > far + 0.05,
            "expected near={near} > far={far} + 0.05"
        );
    }

    #[test]
    fn image_is_deterministic_per_grid() {
        let g: Vec<u8> = (0..64).map(|i| (i * 4) as u8).collect();
        let a = encode_image(&g);
        let b = encode_image(&g);
        assert_eq!(a.hv.0, b.hv.0);
    }

    #[test]
    fn audio_distinguishes_tones() {
        // 64-sample sine waves at two different frequencies.
        let mut s1 = vec![0.0f32; 64];
        let mut s2 = vec![0.0f32; 64];
        for k in 0..64 {
            let t = k as f32 / 64.0;
            s1[k] = (2.0 * core::f32::consts::PI * 4.0 * t).sin();
            s2[k] = (2.0 * core::f32::consts::PI * 16.0 * t).sin();
        }
        let a = encode_audio(&s1);
        let b = encode_audio(&s2);
        let c = a.hv.cosine(&b.hv);
        // Different spectral centres → encodings should NOT be near-identical.
        assert!(c < 0.7, "different tones too similar: cosine = {c}");
    }

    #[test]
    fn video_is_order_sensitive() {
        // Two 4-frame "videos" with the same frames in opposite order.
        let mut frames_fwd: Vec<u8> = Vec::with_capacity(64 * 4);
        for t in 0..4 {
            for i in 0..64 {
                frames_fwd.push(((i + t * 8) as u8).wrapping_mul(3));
            }
        }
        let mut frames_rev: Vec<u8> = Vec::with_capacity(64 * 4);
        for t in (0..4).rev() {
            for i in 0..64 {
                frames_rev.push(((i + t * 8) as u8).wrapping_mul(3));
            }
        }
        let a = encode_video(&frames_fwd, 4);
        let b = encode_video(&frames_rev, 4);
        // Temporal binding via permute makes order matter.
        let c = a.hv.cosine(&b.hv);
        assert!(c.abs() < 0.7, "video order should change the hv, cosine = {c}");
    }
}

This is the exact Rust file compiled into the WASM module the page just loaded. Every line that runs on your device is here. The receipt is a function of this code, not a bespoke benchmark.

About this exhibit

Modality is a router problem at the substrate level, not a model property. A deterministic encoder per modality projects every input into a shared HDC hypervector space; the cascade routes from there. The encoders are small, plain, and individually testable.

The cosine matrix on the canvas above is recomputed every frame — diagonal terms are 1.0 (self-similarity), and inputs that come from the same scene (e.g. "circle" + an image of a circle) share more variance than inputs from different scenes. The whole encoding ladder costs picojoules.

/exhibits/reason showed the cascade walk; this exhibit shows the substrate that feeds it. A cascade routes by modality because the encoders make modality a first-class observable.