Omnimodal — four modalities, one shape-key space
Text, image, audio, and video each ride a deterministic encoder — trigram bag, 8×8 cell bind, FFT-band bag, and per-frame image-bind plus temporal permute-and-bundle for video. All four produce a 10 000-dim bipolar HDC hypervector in the same space. The 4×4 matrix at the bottom is the cosine between every pair of encoder outputs; it is recomputed every frame and is the proof that the cross-modal projection works as a substrate property — no joint training, no shared encoder. The exhibit rotates through three scenes (circle / square / diagonal); each scene supplies one sample per modality.
view the kernel that wrote this receipt crates/mathground-view/src/modal.rs
//! Modality routers: four deterministic encoders that all project into
//! the same HDC shape-key space.
//!
//! - **Text**: trigram bag → bundle of seeded random hypervectors.
//! - **Image**: 8×8 intensity grid → cell-level (level ⊗ position) bind,
//! then bundle.
//! - **Audio**: log-magnitude FFT spectrum → spectrum-band bag (bind
//! band-id with magnitude level, bundle bands).
//! - **Video**: keyframes → per-frame image-encoding → temporal
//! permute-and-bundle (Plate-style sequence binding).
//!
//! All four produce an `Hv` of dimension `HDC_DIM`. Cross-modal cosine
//! similarity is then a meaningful number: an image of a square and the
//! text "square" produce hypervectors whose cosine is much higher than
//! a random pair, because both projects share atom hypervectors in the
//! same algebraic space.
//!
//! The atom-hypervector seeds are derived from BLAKE3-truncated keys,
//! so the same atom (e.g. "circle", "low-frequency", grid-cell (3,4))
//! always seeds the same hypervector — making the encoders deterministic
//! and the cross-modal alignment a substrate property, not a learned one.
use blake3;
use crate::fft::fft_in_place;
use crate::hdc::{bitops_bind, bitops_bundle, bitops_permute, Hv, HDC_DIM};
/// Stable seed for an atom identified by a string key.
fn seed(key: &str) -> u64 {
let h = blake3::hash(key.as_bytes());
let b = h.as_bytes();
u64::from_le_bytes([b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7]])
}
/// Per-encoding report: the resulting hypervector + the bit-ops cost.
pub struct ModalReport {
pub hv: Hv,
pub bit_ops: u64,
}
// ── Text ──────────────────────────────────────────────────────────────
/// Trigram-bag encoding. Each trigram seeds a hypervector; all are
/// bundled. Cheap, order-insensitive (deliberately — the order story
/// is on the temporal side for video).
pub fn encode_text(text: &str) -> ModalReport {
let lower: String = text.to_lowercase();
// Lower-case, whitespace-collapse, then walk trigrams over characters.
let normalized: String = lower
.split_whitespace()
.collect::<Vec<_>>()
.join(" ");
let chars: Vec<char> = normalized.chars().collect();
let mut hv = Hv::zero(HDC_DIM);
let mut atoms: Vec<Hv> = Vec::with_capacity(chars.len().saturating_sub(2));
for w in chars.windows(3) {
let key = format!("t:{}{}{}", w[0], w[1], w[2]);
atoms.push(Hv::random(seed(&key), HDC_DIM));
}
let refs: Vec<&Hv> = atoms.iter().collect();
hv.bundle_into(&refs);
let bit_ops = bitops_bundle(HDC_DIM, atoms.len().max(1));
ModalReport { hv, bit_ops }
}
// ── Image ─────────────────────────────────────────────────────────────
/// 8×8 intensity-grid encoding. Each (row, col) position is bound with
/// a quantised-intensity atom; bundle across all cells.
pub fn encode_image(grid: &[u8]) -> ModalReport {
assert_eq!(grid.len(), 64, "encode_image expects an 8x8 = 64-cell grid");
let mut hv = Hv::zero(HDC_DIM);
let mut atoms: Vec<Hv> = Vec::with_capacity(64);
for (i, &v) in grid.iter().enumerate() {
let r = i / 8;
let c = i % 8;
let level = v / 32; // 8 intensity bins
let pos = Hv::random(seed(&format!("img-pos:{r}:{c}")), HDC_DIM);
let lvl = Hv::random(seed(&format!("img-lvl:{level}")), HDC_DIM);
atoms.push(pos.bind(&lvl));
}
let refs: Vec<&Hv> = atoms.iter().collect();
hv.bundle_into(&refs);
let bit_ops = (64u64 * bitops_bind(HDC_DIM)) + bitops_bundle(HDC_DIM, 64);
ModalReport { hv, bit_ops }
}
// ── Audio ─────────────────────────────────────────────────────────────
/// Spectrum-bag encoding. Run an FFT on the input, log-magnitude into
/// 16 frequency bands, then bind each band's id with a magnitude-level
/// atom and bundle.
pub fn encode_audio(samples: &[f32]) -> ModalReport {
let n = samples.len().next_power_of_two();
let mut buf = vec![0.0f32; 2 * n];
for (k, s) in samples.iter().enumerate().take(n) {
buf[2 * k] = *s;
buf[2 * k + 1] = 0.0;
}
// Bit-ops from the FFT (returned by fft_in_place).
let fft_report = fft_in_place(&mut buf).expect("power-of-two");
// Reduce to 16 band magnitudes (log + quantise).
let half = n / 2;
let band_count = 16usize.min(half);
let bin_per_band = (half + band_count - 1) / band_count;
let mut bands = vec![0.0f32; band_count];
for k in 0..half {
let re = buf[2 * k];
let im = buf[2 * k + 1];
let mag = (re * re + im * im).sqrt();
let b = (k / bin_per_band).min(band_count - 1);
if mag > bands[b] { bands[b] = mag; }
}
let max_mag = bands.iter().cloned().fold(1e-6f32, f32::max);
let mut atoms: Vec<Hv> = Vec::with_capacity(band_count);
// Only bundle bands above a noise-floor threshold. A flat spectrum
// is encoded by the absence of strong bands, not by bundling 16
// near-silent "this is quiet" atoms — which would collapse different
// signals onto the same hypervector (most of the bundle would be
// identical "quiet" atoms).
for (b, &m) in bands.iter().enumerate() {
let level_db = (20.0 * (m / max_mag).max(1e-3).log10()).clamp(-60.0, 0.0);
if level_db < -20.0 {
continue;
}
let level_bin = ((-level_db / 20.0) * 4.0).floor().clamp(0.0, 3.0) as u32;
let band = Hv::random(seed(&format!("aud-band:{b}")), HDC_DIM);
let lvl = Hv::random(seed(&format!("aud-lvl:{level_bin}")), HDC_DIM);
atoms.push(band.bind(&lvl));
}
if atoms.is_empty() {
// Silent input — return a stable "silence" hypervector so cosine is
// defined but doesn't collide with any real signal.
atoms.push(Hv::random(seed("aud-silence"), HDC_DIM));
}
let mut hv = Hv::zero(HDC_DIM);
let refs: Vec<&Hv> = atoms.iter().collect();
hv.bundle_into(&refs);
let bit_ops = fft_report.bit_ops
+ (atoms.len() as u64 * bitops_bind(HDC_DIM))
+ bitops_bundle(HDC_DIM, atoms.len().max(1));
ModalReport { hv, bit_ops }
}
// ── Video ─────────────────────────────────────────────────────────────
/// Keyframe sequence → per-frame image-encoding → temporal permute-and-
/// bundle (Plate / Kanerva sequence-encoding). Returns one hypervector
/// for the whole clip.
///
/// `keyframes` is a flat array of 8×8 grids (`keyframes.len() == 64 *
/// frame_count`). Frames are processed in order; frame `t` is permuted
/// by `t` positions before bundling, so videos with the same frames in
/// different order produce different hypervectors.
pub fn encode_video(keyframes: &[u8], frame_count: usize) -> ModalReport {
assert_eq!(keyframes.len(), 64 * frame_count, "expects 8x8 grids × frames");
let mut per_frame: Vec<Hv> = Vec::with_capacity(frame_count);
let mut per_frame_cost: u64 = 0;
for t in 0..frame_count {
let grid = &keyframes[t * 64..(t + 1) * 64];
let f = encode_image(grid);
per_frame_cost += f.bit_ops;
per_frame.push(f.hv.permute(t as isize));
}
let mut hv = Hv::zero(HDC_DIM);
let refs: Vec<&Hv> = per_frame.iter().collect();
hv.bundle_into(&refs);
let bit_ops = per_frame_cost
+ (frame_count as u64 * bitops_permute(HDC_DIM))
+ bitops_bundle(HDC_DIM, frame_count);
ModalReport { hv, bit_ops }
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn text_is_deterministic() {
let a = encode_text("circle of radius one");
let b = encode_text("circle of radius one");
assert_eq!(a.hv.0, b.hv.0);
}
#[test]
fn text_similar_phrases_have_higher_cosine_than_unrelated() {
let a = encode_text("circle radius one");
let b = encode_text("circle radius two");
let c = encode_text("the lazy dog ate breakfast");
let near = a.hv.cosine(&b.hv);
let far = a.hv.cosine(&c.hv);
assert!(
near > far + 0.05,
"expected near={near} > far={far} + 0.05"
);
}
#[test]
fn image_is_deterministic_per_grid() {
let g: Vec<u8> = (0..64).map(|i| (i * 4) as u8).collect();
let a = encode_image(&g);
let b = encode_image(&g);
assert_eq!(a.hv.0, b.hv.0);
}
#[test]
fn audio_distinguishes_tones() {
// 64-sample sine waves at two different frequencies.
let mut s1 = vec![0.0f32; 64];
let mut s2 = vec![0.0f32; 64];
for k in 0..64 {
let t = k as f32 / 64.0;
s1[k] = (2.0 * core::f32::consts::PI * 4.0 * t).sin();
s2[k] = (2.0 * core::f32::consts::PI * 16.0 * t).sin();
}
let a = encode_audio(&s1);
let b = encode_audio(&s2);
let c = a.hv.cosine(&b.hv);
// Different spectral centres → encodings should NOT be near-identical.
assert!(c < 0.7, "different tones too similar: cosine = {c}");
}
#[test]
fn video_is_order_sensitive() {
// Two 4-frame "videos" with the same frames in opposite order.
let mut frames_fwd: Vec<u8> = Vec::with_capacity(64 * 4);
for t in 0..4 {
for i in 0..64 {
frames_fwd.push(((i + t * 8) as u8).wrapping_mul(3));
}
}
let mut frames_rev: Vec<u8> = Vec::with_capacity(64 * 4);
for t in (0..4).rev() {
for i in 0..64 {
frames_rev.push(((i + t * 8) as u8).wrapping_mul(3));
}
}
let a = encode_video(&frames_fwd, 4);
let b = encode_video(&frames_rev, 4);
// Temporal binding via permute makes order matter.
let c = a.hv.cosine(&b.hv);
assert!(c.abs() < 0.7, "video order should change the hv, cosine = {c}");
}
}
This is the exact Rust file compiled into the WASM module the page just loaded. Every line that runs on your device is here. The receipt is a function of this code, not a bespoke benchmark.
About this exhibit
Modality is a router problem at the substrate level, not a model property. A deterministic encoder per modality projects every input into a shared HDC hypervector space; the cascade routes from there. The encoders are small, plain, and individually testable.
The cosine matrix on the canvas above is recomputed every frame — diagonal terms are 1.0 (self-similarity), and inputs that come from the same scene (e.g. "circle" + an image of a circle) share more variance than inputs from different scenes. The whole encoding ladder costs picojoules.
/exhibits/reason showed the cascade walk; this exhibit shows the substrate that feeds it. A cascade routes by modality because the encoders make modality a first-class observable.