Splat world — 3-D Gaussians
A 32,768-splat procedural galaxy rendered through a full WebGPU pipeline:
PROJECT_WGSL (per-splat covariance → screen-space ellipse) →
a 6-stage radix sort on 16-bit depth keys → instanced triangle-strip quads with a
Gaussian-falloff fragment shader in premultiplied alpha. The receipt below carries
a GaussianSplat artifact and reports joules-per-splat alongside the usual
impedance term μ. Upload an image to emit a second receipt for the substrate cost of
authoring a new splat shell.
WebGPU is not available in this browser.
The receipt panel still measures the asset cost of the static demo (decode bytes × 8). Open the page in Chrome ≥ 113, Edge ≥ 113, Safari ≥ 18, or Firefox Nightly to see the pipeline render.
view the kernel that wrote this receipt crates/mathground-splat/src/render.rs
//! WebGPU pipeline for splat rendering.
//!
//! Driven via `js_sys::Reflect` instead of the `wgpu` Rust crate. This keeps
//! the WASM bundle slim and matches the access pattern used by
//! `lux-worlds-web`.
//!
//! Stages (filled in incrementally — see plan P1.3..P1.7):
//! 1. Device acquire (adapter → device → queue → canvas context configure) ← P1.3 ✅
//! 2. Buffer upload (`SplatCloud::to_gpu_buffer()` → storage buffer) ← P1.4 ✅
//! 3. Project compute pipeline (`PROJECT_WGSL`) ← P1.5 ✅
//! 4. Radix sort (`SORT_CLEAR/HISTOGRAM/SCAN_*/SCATTER_WGSL`) ← P1.6 ✅
//! 5. Paint pipeline (instanced-quad Gaussian falloff render) ← P1.7 ✅
use glam::{Mat4, Vec3, Vec4};
use js_sys::{Array, Float32Array, Object, Uint32Array, Uint8Array};
use lux_worlds_splat::{
shaders::{
PROJECT_WGSL, SORT_CLEAR_WGSL, SORT_HISTOGRAM_WGSL, SORT_SCAN_BROADCAST_WGSL,
SORT_SCAN_GLOBAL_WGSL, SORT_SCAN_LOCAL_WGSL, SORT_SCATTER_WGSL,
},
Splat, SplatCloud,
};
use mgai_meter_web::now_ns;
use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;
/// Clear color when no splats have rendered yet: the same desaturated indigo
/// the mathground exhibits use as their canvas backdrop.
const CLEAR_RGBA: [f64; 4] = [0.031, 0.047, 0.078, 1.0];
/// WebGPU `GPUBufferUsage` bit flags.
const USAGE_STORAGE_COPY_DST: f64 = 128.0 + 8.0; // STORAGE | COPY_DST = 136
const USAGE_STORAGE_RW: f64 = 128.0; // STORAGE
const USAGE_UNIFORM_COPY_DST: f64 = 64.0 + 8.0; // UNIFORM | COPY_DST = 72
/// Project-pass uniform layout, std140-padded to 256 bytes.
///
/// Layout (matches `PROJECT_WGSL::Uniforms`):
/// view offset 0 size 64
/// proj offset 64 size 64
/// view_proj offset 128 size 64
/// cam_pos offset 192 size 12
/// time offset 204 size 4 (packs into vec3's 4-byte tail)
/// viewport offset 208 size 8
/// _pad1 offset 216 size 8
/// anchor offset 224 size 16
/// (padding to 256)
const UNIFORM_BYTES: usize = 256;
/// Approximate bit-ops per splat for the project pass: 3 mat4 multiplies,
/// 2 mat3 builds, a 2×2 eigendecomposition, a normalisation, a sqrt and a
/// handful of branches. At f32 grain (~32 bits per scalar op) ≈ 1024.
const PROJECT_BIT_OPS_PER_SPLAT: u64 = 1024;
/// Sort-stage fixed bit-op cost (independent of splat count): histogram
/// clear + workgroup-local scan + global scan + broadcast. ≈ 21M.
const SORT_FIXED_BIT_OPS: u64 = 21_000_000;
/// Per-splat bit-ops added on top of the fixed sort cost: 32 bits for the
/// histogram atomicAdd + 64 bits for the 8-byte SCATTER write.
const SORT_BIT_OPS_PER_SPLAT: u64 = 96;
/// `SortParams` uniform = `count: u32 + 3 × _pad: u32` = 16 bytes.
const SORT_PARAMS_BYTES: usize = 16;
/// Histogram bucket count from the shader header: 65536 atomic<u32> = 256 KB.
const HISTOGRAM_BUCKETS: u64 = 65_536;
/// 256 workgroup totals = 1 KB.
const WG_TOTALS_LEN: u64 = 256;
/// Approximate bit-ops per splat for the paint pass (vertex shader for the
/// 4-vertex quad + fragment shader over the covered pixels with discards
/// outside 3σ). Empirical conservative bound at typical demo splat sizes.
const PAINT_BIT_OPS_PER_SPLAT: u64 = 32_768;
/// `PaintUniforms` = `viewport: vec2<f32> + _pad: vec2<f32>` = 16 bytes.
const PAINT_UNIFORM_BYTES: usize = 16;
/// Screen-space ellipse render shader. Consumes the `PROJECT_WGSL` output
/// (`ProjectedSplat` array indexed via the sorted `keys_out`), draws one
/// 4-vertex triangle-strip quad per visible splat, paints a Gaussian falloff
/// in premultiplied alpha. Quad corners are scaled into σ-units so the
/// fragment's `r² = dot(local, local)` is the Mahalanobis distance squared
/// (the eigenbasis was already applied in the vertex shader).
const PAINT_WGSL: &str = r#"
struct ProjectedSplat {
ndc_x: f32, ndc_y: f32, depth: f32,
axis_a: f32, axis_b: f32, angle: f32,
col_r: f32, col_g: f32, col_b: f32, opacity: f32,
_pad0: f32, _pad1: f32,
};
struct PaintUniforms {
viewport: vec2<f32>,
_pad: vec2<f32>,
};
@group(0) @binding(0) var<storage, read> projected: array<ProjectedSplat>;
@group(0) @binding(1) var<storage, read> keys_out: array<vec2<u32>>;
@group(0) @binding(2) var<uniform> u: PaintUniforms;
struct VertexOutput {
@builtin(position) pos: vec4<f32>,
@location(0) color: vec3<f32>,
@location(1) opacity: f32,
@location(2) local: vec2<f32>,
};
@vertex
fn vs_main(
@builtin(vertex_index) vid: u32,
@builtin(instance_index) iid: u32,
) -> VertexOutput {
var quad: array<vec2<f32>, 4> = array<vec2<f32>, 4>(
vec2<f32>(-1.0, -1.0),
vec2<f32>( 1.0, -1.0),
vec2<f32>(-1.0, 1.0),
vec2<f32>( 1.0, 1.0),
);
var out: VertexOutput;
let splat_idx = keys_out[iid].y;
let s = projected[splat_idx];
if (s.opacity <= 0.0) {
// Cull (PROJECT marked it invalid).
out.pos = vec4<f32>(2.0, 2.0, 2.0, 1.0);
out.color = vec3<f32>(0.0);
out.opacity = 0.0;
out.local = vec2<f32>(0.0);
return out;
}
let corner = quad[vid];
let c = cos(s.angle);
let sn = sin(s.angle);
let major = vec2<f32>(c, sn) * s.axis_a * 3.0;
let minor = vec2<f32>(-sn, c) * s.axis_b * 3.0;
let pixel_offset = corner.x * major + corner.y * minor;
let ndc_offset = pixel_offset / u.viewport * 2.0;
let ndc = vec2<f32>(s.ndc_x, s.ndc_y) + ndc_offset;
out.pos = vec4<f32>(ndc, s.depth, 1.0);
out.color = vec3<f32>(s.col_r, s.col_g, s.col_b);
out.opacity = s.opacity;
out.local = corner * 3.0;
return out;
}
@fragment
fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
let r2 = dot(in.local, in.local);
if (r2 > 9.0) { discard; }
let alpha = in.opacity * exp(-0.5 * r2);
if (alpha < 1.0 / 255.0) { discard; }
return vec4<f32>(in.color * alpha, alpha);
}
"#;
// ── tiny js_sys::Reflect helpers (same shape as lux-worlds-web's) ───────
fn js_set(obj: &JsValue, key: &str, val: &JsValue) {
let _ = js_sys::Reflect::set(obj, &JsValue::from_str(key), val);
}
fn js_get(obj: &JsValue, key: &str) -> JsValue {
js_sys::Reflect::get(obj, &JsValue::from_str(key)).unwrap_or(JsValue::UNDEFINED)
}
fn js_call(obj: &JsValue, method: &str, args: &[JsValue]) -> Result<JsValue, JsValue> {
let func: js_sys::Function = js_get(obj, method).unchecked_into();
let arr = Array::new();
for a in args {
arr.push(a);
}
js_sys::Reflect::apply(&func, obj, &arr)
}
fn write_bytes_to_buffer(queue: &JsValue, buffer: &JsValue, bytes: &[u8]) {
let arr = Uint8Array::from(bytes);
let _ = js_call(
queue,
"writeBuffer",
&[
buffer.clone(),
JsValue::from_f64(0.0),
arr.buffer().into(),
arr.byte_offset().into(),
arr.byte_length().into(),
],
);
}
fn create_buffer(device: &JsValue, size: u64, usage: f64, label: &str) -> JsValue {
let desc = Object::new();
js_set(&desc, "size", &JsValue::from_f64(size as f64));
js_set(&desc, "usage", &JsValue::from_f64(usage));
js_set(&desc, "label", &JsValue::from_str(label));
js_call(device, "createBuffer", &[desc.into()]).unwrap_or(JsValue::NULL)
}
fn create_shader_module(device: &JsValue, code: &str, label: &str) -> JsValue {
let desc = Object::new();
js_set(&desc, "code", &JsValue::from_str(code));
js_set(&desc, "label", &JsValue::from_str(label));
js_call(device, "createShaderModule", &[desc.into()]).unwrap_or(JsValue::NULL)
}
fn create_compute_pipeline(device: &JsValue, module: &JsValue, label: &str) -> JsValue {
let compute = Object::new();
js_set(&compute, "module", module);
js_set(&compute, "entryPoint", &JsValue::from_str("main"));
let desc = Object::new();
js_set(&desc, "label", &JsValue::from_str(label));
js_set(&desc, "layout", &JsValue::from_str("auto"));
js_set(&desc, "compute", &compute.into());
js_call(device, "createComputePipeline", &[desc.into()]).unwrap_or(JsValue::NULL)
}
fn buffer_resource(buffer: &JsValue) -> JsValue {
let r = Object::new();
js_set(&r, "buffer", buffer);
r.into()
}
fn bind_entry(binding: u32, resource: JsValue) -> JsValue {
let e = Object::new();
js_set(&e, "binding", &JsValue::from_f64(binding as f64));
js_set(&e, "resource", &resource);
e.into()
}
// ── Camera ──────────────────────────────────────────────────────────────
/// Slow orbital camera centred on the splat cloud's bounds. P1.5 only needs
/// enough motion to make the per-frame project pass non-degenerate; once
/// paint lands in P1.7 the page will swap in proper navigation.
pub struct Camera {
pub target: Vec3,
pub distance: f32,
pub yaw: f32,
pub pitch: f32,
pub fov: f32,
pub aspect: f32,
}
impl Camera {
pub fn orbital(target: Vec3, distance: f32, aspect: f32) -> Self {
Self {
target,
distance,
yaw: 0.0,
pitch: -0.2,
fov: 0.9,
aspect,
}
}
pub fn eye(&self) -> Vec3 {
let xz = self.distance * self.pitch.cos();
Vec3::new(
self.target.x + xz * self.yaw.cos(),
self.target.y + self.distance * self.pitch.sin(),
self.target.z + xz * self.yaw.sin(),
)
}
pub fn view(&self) -> Mat4 {
Mat4::look_at_rh(self.eye(), self.target, Vec3::Y)
}
pub fn proj(&self) -> Mat4 {
Mat4::perspective_rh(self.fov, self.aspect.max(1e-3), 0.05, 1000.0)
}
}
/// Pack the project-pass uniform into `UNIFORM_BYTES` of std140 layout.
fn pack_uniform(cam: &Camera, time_s: f32, viewport: [f32; 2], anchor: Vec4) -> [u8; UNIFORM_BYTES] {
let mut out = [0u8; UNIFORM_BYTES];
let v = cam.view();
let p = cam.proj();
let vp = p * v;
let write_mat = |out: &mut [u8], offset: usize, m: &Mat4| {
let cols: [[f32; 4]; 4] = m.to_cols_array_2d();
for (i, col) in cols.iter().enumerate() {
for (j, x) in col.iter().enumerate() {
let o = offset + (i * 4 + j) * 4;
out[o..o + 4].copy_from_slice(&x.to_le_bytes());
}
}
};
write_mat(&mut out, 0, &v);
write_mat(&mut out, 64, &p);
write_mat(&mut out, 128, &vp);
let eye = cam.eye();
out[192..196].copy_from_slice(&eye.x.to_le_bytes());
out[196..200].copy_from_slice(&eye.y.to_le_bytes());
out[200..204].copy_from_slice(&eye.z.to_le_bytes());
out[204..208].copy_from_slice(&time_s.to_le_bytes());
out[208..212].copy_from_slice(&viewport[0].to_le_bytes());
out[212..216].copy_from_slice(&viewport[1].to_le_bytes());
out[224..228].copy_from_slice(&anchor.x.to_le_bytes());
out[228..232].copy_from_slice(&anchor.y.to_le_bytes());
out[232..236].copy_from_slice(&anchor.z.to_le_bytes());
out[236..240].copy_from_slice(&anchor.w.to_le_bytes());
out
}
// ── One acquired WebGPU context ─────────────────────────────────────────
pub struct GpuContext {
pub device: JsValue,
pub queue: JsValue,
pub context: JsValue,
pub format: String,
pub width: u32,
pub height: u32,
}
impl GpuContext {
pub async fn acquire(canvas_id: &str) -> Result<Self, JsValue> {
let global = js_sys::global();
let navigator = js_get(&global, "navigator");
let gpu_obj = js_get(&navigator, "gpu");
if gpu_obj.is_undefined() || gpu_obj.is_null() {
return Err(JsValue::from_str("WebGPU not supported"));
}
let adapter_opts = Object::new();
js_set(
&adapter_opts,
"powerPreference",
&JsValue::from_str("high-performance"),
);
let adapter_promise = js_call(&gpu_obj, "requestAdapter", &[adapter_opts.into()])?;
let adapter =
wasm_bindgen_futures::JsFuture::from(js_sys::Promise::from(adapter_promise)).await?;
if adapter.is_null() || adapter.is_undefined() {
return Err(JsValue::from_str("WebGPU requestAdapter returned null"));
}
let limits = js_get(&adapter, "limits");
let max_storage_buf = js_get(&limits, "maxStorageBufferBindingSize")
.as_f64()
.unwrap_or(134_217_728.0);
let max_buffer = js_get(&limits, "maxBufferSize")
.as_f64()
.unwrap_or(268_435_456.0);
let max_storage_per_stage = js_get(&limits, "maxStorageBuffersPerShaderStage")
.as_f64()
.unwrap_or(10.0);
let required_limits = Object::new();
js_set(
&required_limits,
"maxStorageBufferBindingSize",
&JsValue::from_f64(max_storage_buf),
);
js_set(
&required_limits,
"maxBufferSize",
&JsValue::from_f64(max_buffer),
);
js_set(
&required_limits,
"maxStorageBuffersPerShaderStage",
&JsValue::from_f64(max_storage_per_stage),
);
let device_desc = Object::new();
js_set(&device_desc, "requiredLimits", &required_limits.into());
let device_promise = js_call(&adapter, "requestDevice", &[device_desc.into()])?;
let device =
wasm_bindgen_futures::JsFuture::from(js_sys::Promise::from(device_promise)).await?;
let queue = js_get(&device, "queue");
let document = js_get(&global, "document");
let canvas = js_call(
&document,
"getElementById",
&[JsValue::from_str(canvas_id)],
)?;
if canvas.is_null() || canvas.is_undefined() {
return Err(JsValue::from_str(&format!(
"canvas #{canvas_id} not found in document"
)));
}
let context = js_call(&canvas, "getContext", &[JsValue::from_str("webgpu")])?;
if context.is_null() || context.is_undefined() {
return Err(JsValue::from_str(
"canvas.getContext('webgpu') returned null",
));
}
let format = js_call(&gpu_obj, "getPreferredCanvasFormat", &[])?
.as_string()
.unwrap_or_else(|| "bgra8unorm".to_string());
let config = Object::new();
js_set(&config, "device", &device);
js_set(&config, "format", &JsValue::from_str(&format));
js_set(&config, "alphaMode", &JsValue::from_str("opaque"));
js_call(&context, "configure", &[config.into()])?;
let width = js_get(&canvas, "width").as_f64().unwrap_or(800.0) as u32;
let height = js_get(&canvas, "height").as_f64().unwrap_or(600.0) as u32;
Ok(Self {
device,
queue,
context,
format,
width,
height,
})
}
/// Frame render: clear + (optional) splat draw in a single render pass.
/// Returns timing + bit-ops for the entire surface frame: a clear cost
/// (`width × height × 32`) and, when a `Paint` pipeline is supplied,
/// the per-splat paint cost (`N × PAINT_BIT_OPS_PER_SPLAT`).
pub fn render_frame(&self, paint: Option<&Paint>) -> StageTiming {
let t0 = now_ns();
let surface_tex = js_call(&self.context, "getCurrentTexture", &[]).unwrap_or(JsValue::NULL);
let surface_view = js_call(&surface_tex, "createView", &[]).unwrap_or(JsValue::NULL);
let encoder_desc = Object::new();
let encoder = js_call(
&self.device,
"createCommandEncoder",
&[encoder_desc.into()],
)
.unwrap_or(JsValue::NULL);
let color_att = Object::new();
js_set(&color_att, "view", &surface_view);
let clear_val = Array::of4(
&JsValue::from_f64(CLEAR_RGBA[0]),
&JsValue::from_f64(CLEAR_RGBA[1]),
&JsValue::from_f64(CLEAR_RGBA[2]),
&JsValue::from_f64(CLEAR_RGBA[3]),
);
js_set(&color_att, "clearValue", &clear_val.into());
js_set(&color_att, "loadOp", &JsValue::from_str("clear"));
js_set(&color_att, "storeOp", &JsValue::from_str("store"));
let pass_desc = Object::new();
js_set(
&pass_desc,
"colorAttachments",
&Array::of1(&color_att.into()).into(),
);
let pass = js_call(&encoder, "beginRenderPass", &[pass_desc.into()])
.unwrap_or(JsValue::NULL);
let mut paint_bit_ops: u64 = 0;
if let Some(p) = paint {
js_call(&pass, "setPipeline", &[p.pipeline.clone()]).ok();
js_call(
&pass,
"setBindGroup",
&[JsValue::from_f64(0.0), p.bind_group.clone()],
)
.ok();
js_call(
&pass,
"draw",
&[JsValue::from_f64(4.0), JsValue::from_f64(p.instance_count as f64)],
)
.ok();
paint_bit_ops = (p.instance_count as u64).saturating_mul(PAINT_BIT_OPS_PER_SPLAT);
}
js_call(&pass, "end", &[]).ok();
let cmd = js_call(&encoder, "finish", &[]).unwrap_or(JsValue::NULL);
js_call(&self.queue, "submit", &[Array::of1(&cmd).into()]).ok();
let wall_ns = now_ns() - t0;
let clear_bit_ops = (self.width as u64) * (self.height as u64) * 32;
StageTiming {
wall_ns,
bit_ops: clear_bit_ops.saturating_add(paint_bit_ops),
}
}
}
// ── Splat buffer (P1.4) ─────────────────────────────────────────────────
pub struct SplatBuffer {
pub handle: JsValue,
pub gpu_bytes: u64,
pub splat_count: u64,
}
impl SplatBuffer {
pub fn upload(gpu: &GpuContext, cloud: &SplatCloud) -> (Self, f64) {
let splat_count = cloud.splats.len() as u64;
let gpu_bytes = splat_count.saturating_mul(Splat::GPU_SIZE as u64);
let t0 = now_ns();
let flat = cloud.to_gpu_buffer();
let arr = Float32Array::from(&flat[..]);
let handle = create_buffer(
&gpu.device,
gpu_bytes,
USAGE_STORAGE_COPY_DST,
"mathground-splat:splats",
);
let _ = js_call(
&gpu.queue,
"writeBuffer",
&[
handle.clone(),
JsValue::from_f64(0.0),
arr.buffer().into(),
arr.byte_offset().into(),
arr.byte_length().into(),
],
);
let wall_ns = now_ns() - t0;
(
Self {
handle,
gpu_bytes,
splat_count,
},
wall_ns,
)
}
}
// ── Projection pass (P1.5) ──────────────────────────────────────────────
pub struct Projection {
pub pipeline: JsValue,
pub bind_group: JsValue,
pub uniform_buf: JsValue,
pub projected_buf: JsValue,
pub sort_keys_buf: JsValue,
pub sh1_buf: JsValue,
pub workgroups: u32,
}
impl Projection {
pub fn new(gpu: &GpuContext, splat_buf: &SplatBuffer) -> Self {
let n = splat_buf.splat_count.max(1);
let uniform_buf = create_buffer(
&gpu.device,
UNIFORM_BYTES as u64,
USAGE_UNIFORM_COPY_DST,
"mathground-splat:uniform",
);
let projected_buf = create_buffer(
&gpu.device,
n * 48,
USAGE_STORAGE_RW,
"mathground-splat:projected",
);
let sort_keys_buf = create_buffer(
&gpu.device,
n * 8,
USAGE_STORAGE_RW,
"mathground-splat:sort_keys",
);
// No-SH path: a tiny zero buffer so `array<f32>` is non-empty. The
// shader's loop runs but every coefficient is zero → DC-only color,
// matching `SplatCloud::sh_buffer()` when the cloud carries no SH.
// 16 floats is enough to satisfy WGSL's "stride > 0" requirement.
let sh_byte_len = 16 * 4;
let sh1_buf = create_buffer(
&gpu.device,
sh_byte_len as u64,
USAGE_STORAGE_COPY_DST, // STORAGE | COPY_DST so we can zero-init it
"mathground-splat:sh1",
);
let zeros = vec![0u8; sh_byte_len];
write_bytes_to_buffer(&gpu.queue, &sh1_buf, &zeros);
let module = create_shader_module(&gpu.device, PROJECT_WGSL, "mathground-splat:project");
let pipeline = create_compute_pipeline(&gpu.device, &module, "mathground-splat:project");
let bgl = js_call(&pipeline, "getBindGroupLayout", &[JsValue::from_f64(0.0)])
.unwrap_or(JsValue::NULL);
let entries = Array::new();
entries.push(&bind_entry(0, buffer_resource(&uniform_buf)));
entries.push(&bind_entry(1, buffer_resource(&splat_buf.handle)));
entries.push(&bind_entry(2, buffer_resource(&projected_buf)));
entries.push(&bind_entry(3, buffer_resource(&sort_keys_buf)));
entries.push(&bind_entry(4, buffer_resource(&sh1_buf)));
let bg_desc = Object::new();
js_set(&bg_desc, "layout", &bgl);
js_set(&bg_desc, "entries", &entries.into());
let bind_group =
js_call(&gpu.device, "createBindGroup", &[bg_desc.into()]).unwrap_or(JsValue::NULL);
let workgroups = ((n as u32) + 255) / 256;
Self {
pipeline,
bind_group,
uniform_buf,
projected_buf,
sort_keys_buf,
sh1_buf,
workgroups,
}
}
/// Update the uniform buffer and dispatch the project compute pass.
pub fn dispatch(
&self,
gpu: &GpuContext,
cam: &Camera,
time_s: f32,
viewport: [f32; 2],
splat_count: u64,
) -> StageTiming {
let t0 = now_ns();
let bytes = pack_uniform(cam, time_s, viewport, Vec4::new(0.0, 0.0, 0.0, 1.0));
write_bytes_to_buffer(&gpu.queue, &self.uniform_buf, &bytes);
let enc_desc = Object::new();
let encoder = js_call(&gpu.device, "createCommandEncoder", &[enc_desc.into()])
.unwrap_or(JsValue::NULL);
let pass_desc = Object::new();
let pass = js_call(&encoder, "beginComputePass", &[pass_desc.into()])
.unwrap_or(JsValue::NULL);
js_call(&pass, "setPipeline", &[self.pipeline.clone()]).ok();
js_call(
&pass,
"setBindGroup",
&[JsValue::from_f64(0.0), self.bind_group.clone()],
)
.ok();
js_call(
&pass,
"dispatchWorkgroups",
&[JsValue::from_f64(self.workgroups as f64)],
)
.ok();
js_call(&pass, "end", &[]).ok();
let cmd = js_call(&encoder, "finish", &[]).unwrap_or(JsValue::NULL);
js_call(&gpu.queue, "submit", &[Array::of1(&cmd).into()]).ok();
let wall_ns = now_ns() - t0;
let bit_ops = splat_count.saturating_mul(PROJECT_BIT_OPS_PER_SPLAT);
StageTiming { wall_ns, bit_ops }
}
}
// ── Sort pass (P1.6) ────────────────────────────────────────────────────
pub struct Sort {
pub clear_pipeline: JsValue,
pub histogram_pipeline: JsValue,
pub scan_local_pipeline: JsValue,
pub scan_global_pipeline: JsValue,
pub scan_broadcast_pipeline: JsValue,
pub scatter_pipeline: JsValue,
pub clear_bg: JsValue,
pub histogram_bg: JsValue,
pub scan_local_bg: JsValue,
pub scan_global_bg: JsValue,
pub scan_broadcast_bg: JsValue,
pub scatter_bg: JsValue,
pub histogram_buf: JsValue,
pub wg_totals_buf: JsValue,
pub keys_out_buf: JsValue,
pub params_buf: JsValue,
pub scatter_workgroups: u32,
}
impl Sort {
/// Build all 6 sort pipelines + bind groups + auxiliary buffers around
/// the project pass's `sort_keys_buf` output. Writes the static
/// `SortParams { count, _pads }` uniform once — the demo doesn't resize
/// the cloud mid-session.
pub fn new(gpu: &GpuContext, projection: &Projection, splat_count: u64) -> Self {
let n = splat_count.max(1);
let histogram_buf = create_buffer(
&gpu.device,
HISTOGRAM_BUCKETS * 4,
USAGE_STORAGE_RW,
"mathground-splat:sort_histogram",
);
let wg_totals_buf = create_buffer(
&gpu.device,
WG_TOTALS_LEN * 4,
USAGE_STORAGE_RW,
"mathground-splat:sort_wg_totals",
);
let keys_out_buf = create_buffer(
&gpu.device,
n * 8,
USAGE_STORAGE_RW,
"mathground-splat:sort_keys_out",
);
let params_buf = create_buffer(
&gpu.device,
SORT_PARAMS_BYTES as u64,
USAGE_UNIFORM_COPY_DST,
"mathground-splat:sort_params",
);
// SortParams { count, _pad0, _pad1, _pad2 } — static for this mount.
let params = [splat_count as u32, 0u32, 0u32, 0u32];
let arr = Uint32Array::from(¶ms[..]);
let _ = js_call(
&gpu.queue,
"writeBuffer",
&[
params_buf.clone(),
JsValue::from_f64(0.0),
arr.buffer().into(),
arr.byte_offset().into(),
arr.byte_length().into(),
],
);
// ── Pipelines ──
let clear_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(&gpu.device, SORT_CLEAR_WGSL, "mathground-splat:sort_clear"),
"mathground-splat:sort_clear",
);
let histogram_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(
&gpu.device,
SORT_HISTOGRAM_WGSL,
"mathground-splat:sort_histogram",
),
"mathground-splat:sort_histogram",
);
let scan_local_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(
&gpu.device,
SORT_SCAN_LOCAL_WGSL,
"mathground-splat:sort_scan_local",
),
"mathground-splat:sort_scan_local",
);
let scan_global_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(
&gpu.device,
SORT_SCAN_GLOBAL_WGSL,
"mathground-splat:sort_scan_global",
),
"mathground-splat:sort_scan_global",
);
let scan_broadcast_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(
&gpu.device,
SORT_SCAN_BROADCAST_WGSL,
"mathground-splat:sort_scan_broadcast",
),
"mathground-splat:sort_scan_broadcast",
);
let scatter_pipeline = create_compute_pipeline(
&gpu.device,
&create_shader_module(
&gpu.device,
SORT_SCATTER_WGSL,
"mathground-splat:sort_scatter",
),
"mathground-splat:sort_scatter",
);
// ── Bind groups (one per pipeline, layout derived from shader) ──
let mk_bg = |pipeline: &JsValue, bindings: &[(u32, &JsValue)]| -> JsValue {
let bgl = js_call(pipeline, "getBindGroupLayout", &[JsValue::from_f64(0.0)])
.unwrap_or(JsValue::NULL);
let entries = Array::new();
for (slot, buf) in bindings {
entries.push(&bind_entry(*slot, buffer_resource(buf)));
}
let desc = Object::new();
js_set(&desc, "layout", &bgl);
js_set(&desc, "entries", &entries.into());
js_call(&gpu.device, "createBindGroup", &[desc.into()]).unwrap_or(JsValue::NULL)
};
let clear_bg = mk_bg(&clear_pipeline, &[(0, &histogram_buf)]);
let histogram_bg = mk_bg(
&histogram_pipeline,
&[
(0, &projection.sort_keys_buf),
(1, &histogram_buf),
(2, ¶ms_buf),
],
);
let scan_local_bg = mk_bg(
&scan_local_pipeline,
&[(0, &histogram_buf), (1, &wg_totals_buf)],
);
let scan_global_bg = mk_bg(&scan_global_pipeline, &[(0, &wg_totals_buf)]);
let scan_broadcast_bg = mk_bg(
&scan_broadcast_pipeline,
&[(0, &histogram_buf), (1, &wg_totals_buf)],
);
let scatter_bg = mk_bg(
&scatter_pipeline,
&[
(0, &projection.sort_keys_buf),
(1, &keys_out_buf),
(2, &histogram_buf),
(3, ¶ms_buf),
],
);
let scatter_workgroups = ((n as u32) + 255) / 256;
Self {
clear_pipeline,
histogram_pipeline,
scan_local_pipeline,
scan_global_pipeline,
scan_broadcast_pipeline,
scatter_pipeline,
clear_bg,
histogram_bg,
scan_local_bg,
scan_global_bg,
scan_broadcast_bg,
scatter_bg,
histogram_buf,
wg_totals_buf,
keys_out_buf,
params_buf,
scatter_workgroups,
}
}
/// Run all 6 sort passes in a single command encoder + single submit.
/// Returns measured wall_ns and an honest bit-op estimate that includes
/// the histogram clear, the workgroup scans, and the per-splat
/// scatter.
pub fn dispatch(&self, gpu: &GpuContext, splat_count: u64) -> StageTiming {
let t0 = now_ns();
let enc_desc = Object::new();
let encoder = js_call(&gpu.device, "createCommandEncoder", &[enc_desc.into()])
.unwrap_or(JsValue::NULL);
let pass_desc = Object::new();
let pass = js_call(&encoder, "beginComputePass", &[pass_desc.into()])
.unwrap_or(JsValue::NULL);
let dispatch = |pipeline: &JsValue, bg: &JsValue, workgroups: u32| {
js_call(&pass, "setPipeline", &[pipeline.clone()]).ok();
js_call(
&pass,
"setBindGroup",
&[JsValue::from_f64(0.0), bg.clone()],
)
.ok();
js_call(
&pass,
"dispatchWorkgroups",
&[JsValue::from_f64(workgroups as f64)],
)
.ok();
};
// 1. CLEAR — zero the 65536-bucket histogram.
dispatch(&self.clear_pipeline, &self.clear_bg, 256);
// 2. HISTOGRAM — atomic-count per bucket.
dispatch(
&self.histogram_pipeline,
&self.histogram_bg,
self.scatter_workgroups,
);
// 3. SCAN_LOCAL — per-block Hillis-Steele scan; emit wg_totals[k].
dispatch(&self.scan_local_pipeline, &self.scan_local_bg, 256);
// 4. SCAN_GLOBAL — exclusive scan over the 256 block totals.
dispatch(&self.scan_global_pipeline, &self.scan_global_bg, 1);
// 5. SCAN_BROADCAST — fold block offsets back into per-bucket prefix.
dispatch(&self.scan_broadcast_pipeline, &self.scan_broadcast_bg, 256);
// 6. SCATTER — each splat atomic-claims its slot in keys_out.
dispatch(
&self.scatter_pipeline,
&self.scatter_bg,
self.scatter_workgroups,
);
js_call(&pass, "end", &[]).ok();
let cmd = js_call(&encoder, "finish", &[]).unwrap_or(JsValue::NULL);
js_call(&gpu.queue, "submit", &[Array::of1(&cmd).into()]).ok();
let wall_ns = now_ns() - t0;
let bit_ops = SORT_FIXED_BIT_OPS
.saturating_add(splat_count.saturating_mul(SORT_BIT_OPS_PER_SPLAT));
StageTiming { wall_ns, bit_ops }
}
}
// ── Paint pass (P1.7) ───────────────────────────────────────────────────
pub struct Paint {
pub pipeline: JsValue,
pub bind_group: JsValue,
pub uniform_buf: JsValue,
pub instance_count: u32,
}
impl Paint {
pub fn new(
gpu: &GpuContext,
projection: &Projection,
sort: &Sort,
splat_count: u64,
) -> Self {
let uniform_buf = create_buffer(
&gpu.device,
PAINT_UNIFORM_BYTES as u64,
USAGE_UNIFORM_COPY_DST,
"mathground-splat:paint_uniform",
);
// Viewport is static for this mount; rewrite if the canvas resizes.
let view = [gpu.width as f32, gpu.height as f32, 0.0_f32, 0.0_f32];
let arr = Float32Array::from(&view[..]);
let _ = js_call(
&gpu.queue,
"writeBuffer",
&[
uniform_buf.clone(),
JsValue::from_f64(0.0),
arr.buffer().into(),
arr.byte_offset().into(),
arr.byte_length().into(),
],
);
let module = create_shader_module(&gpu.device, PAINT_WGSL, "mathground-splat:paint");
// ── Render pipeline (triangle-strip, premultiplied alpha blend) ──
let blend_color = Object::new();
js_set(&blend_color, "srcFactor", &JsValue::from_str("one"));
js_set(
&blend_color,
"dstFactor",
&JsValue::from_str("one-minus-src-alpha"),
);
js_set(&blend_color, "operation", &JsValue::from_str("add"));
let blend_alpha = Object::new();
js_set(&blend_alpha, "srcFactor", &JsValue::from_str("one"));
js_set(
&blend_alpha,
"dstFactor",
&JsValue::from_str("one-minus-src-alpha"),
);
js_set(&blend_alpha, "operation", &JsValue::from_str("add"));
let blend = Object::new();
js_set(&blend, "color", &blend_color.into());
js_set(&blend, "alpha", &blend_alpha.into());
let target = Object::new();
js_set(&target, "format", &JsValue::from_str(&gpu.format));
js_set(&target, "blend", &blend.into());
let vertex = Object::new();
js_set(&vertex, "module", &module);
js_set(&vertex, "entryPoint", &JsValue::from_str("vs_main"));
let fragment = Object::new();
js_set(&fragment, "module", &module);
js_set(&fragment, "entryPoint", &JsValue::from_str("fs_main"));
js_set(&fragment, "targets", &Array::of1(&target.into()).into());
let primitive = Object::new();
js_set(&primitive, "topology", &JsValue::from_str("triangle-strip"));
let desc = Object::new();
js_set(&desc, "label", &JsValue::from_str("mathground-splat:paint"));
js_set(&desc, "layout", &JsValue::from_str("auto"));
js_set(&desc, "vertex", &vertex.into());
js_set(&desc, "fragment", &fragment.into());
js_set(&desc, "primitive", &primitive.into());
let pipeline =
js_call(&gpu.device, "createRenderPipeline", &[desc.into()]).unwrap_or(JsValue::NULL);
let bgl = js_call(&pipeline, "getBindGroupLayout", &[JsValue::from_f64(0.0)])
.unwrap_or(JsValue::NULL);
let entries = Array::new();
entries.push(&bind_entry(0, buffer_resource(&projection.projected_buf)));
entries.push(&bind_entry(1, buffer_resource(&sort.keys_out_buf)));
entries.push(&bind_entry(2, buffer_resource(&uniform_buf)));
let bg_desc = Object::new();
js_set(&bg_desc, "layout", &bgl);
js_set(&bg_desc, "entries", &entries.into());
let bind_group =
js_call(&gpu.device, "createBindGroup", &[bg_desc.into()]).unwrap_or(JsValue::NULL);
Self {
pipeline,
bind_group,
uniform_buf,
instance_count: splat_count as u32,
}
}
}
// ── UploadResult (one-shot at mount) ────────────────────────────────────
pub struct UploadResult {
pub gpu_bytes: u64,
pub splat_count: u64,
pub wall_ns: f64,
}
// ── SplatRenderer (composes the pipeline) ───────────────────────────────
/// Estimate a scene-fit camera distance from the cloud's bounding box.
fn frame_target_and_distance(cloud: &SplatCloud) -> (Vec3, f32) {
let (min, max) = cloud.bounds();
let centre = Vec3::new(
0.5 * (min[0] + max[0]),
0.5 * (min[1] + max[1]),
0.5 * (min[2] + max[2]),
);
let extent = Vec3::new(
max[0] - min[0],
max[1] - min[1],
max[2] - min[2],
);
let radius = extent.length().max(1.0);
(centre, radius * 1.6)
}
pub struct SplatRenderer {
pub gpu: Option<GpuContext>,
pub splats: Option<SplatBuffer>,
pub projection: Option<Projection>,
pub sort: Option<Sort>,
pub paint: Option<Paint>,
pub camera: Option<Camera>,
pub time_s: f32,
pub source_bytes: u64,
pub splat_count: u64,
upload: Option<UploadResult>,
}
impl SplatRenderer {
pub async fn mount(
canvas_id: &str,
decoded: crate::decode::DecodedSplat,
) -> Result<Self, JsValue> {
let source_bytes = decoded.source_bytes;
let splat_count = decoded.splat_count;
let cloud = decoded.cloud;
let gpu = match GpuContext::acquire(canvas_id).await {
Ok(ctx) => Some(ctx),
Err(e) => {
web_sys::console::warn_1(&e);
None
}
};
let (splats, upload, projection, sort, paint, camera) = match &gpu {
Some(ctx) => {
let (buf, wall_ns) = SplatBuffer::upload(ctx, &cloud);
let result = UploadResult {
gpu_bytes: buf.gpu_bytes,
splat_count: buf.splat_count,
wall_ns,
};
let proj = Projection::new(ctx, &buf);
let sort = Sort::new(ctx, &proj, buf.splat_count);
let paint = Paint::new(ctx, &proj, &sort, buf.splat_count);
let (centre, dist) = frame_target_and_distance(&cloud);
let aspect = ctx.width as f32 / ctx.height.max(1) as f32;
let cam = Camera::orbital(centre, dist, aspect);
(
Some(buf),
Some(result),
Some(proj),
Some(sort),
Some(paint),
Some(cam),
)
}
None => (None, None, None, None, None, None),
};
Ok(Self {
gpu,
splats,
projection,
sort,
paint,
camera,
time_s: 0.0,
source_bytes,
splat_count,
upload,
})
}
pub fn has_gpu(&self) -> bool {
self.gpu.is_some()
}
pub fn take_upload_result(&mut self) -> Option<UploadResult> {
self.upload.take()
}
/// One frame. Returns accumulated timing + bit-ops across every stage
/// that has been wired so far (P1.3 clear + P1.5 project + …).
pub fn frame(&mut self) -> FrameResult {
let Some(gpu) = self.gpu.as_ref() else {
return FrameResult::zero();
};
let mut wall_ns = 0.0;
let mut bit_ops = 0u64;
// Project pass — runs first so the sorted/painted stages (P1.6, P1.7)
// can fold their own timing into the same FrameResult.
self.time_s += 1.0 / 60.0;
if let Some(c) = self.camera.as_mut() {
// Slow orbit so the per-frame project work isn't trivially
// cache-hit.
c.yaw += 0.004;
}
if let (Some(proj), Some(cam), Some(splat_buf)) =
(&self.projection, &self.camera, &self.splats)
{
let stage = proj.dispatch(
gpu,
cam,
self.time_s,
[gpu.width as f32, gpu.height as f32],
splat_buf.splat_count,
);
wall_ns += stage.wall_ns;
bit_ops = bit_ops.saturating_add(stage.bit_ops);
}
// Sort pass — depth-orders the projected splats so the (P1.7) paint
// pipeline can blend them back-to-front. Queue-submitted after the
// project pass; WebGPU guarantees in-submit-order execution so the
// sort reads the project's writes without an explicit barrier.
if let (Some(sort), Some(splat_buf)) = (&self.sort, &self.splats) {
let stage = sort.dispatch(gpu, splat_buf.splat_count);
wall_ns += stage.wall_ns;
bit_ops = bit_ops.saturating_add(stage.bit_ops);
}
// Render pass: clear + (P1.7) paint pipeline in one render pass.
let render = gpu.render_frame(self.paint.as_ref());
wall_ns += render.wall_ns;
bit_ops = bit_ops.saturating_add(render.bit_ops);
FrameResult { wall_ns, bit_ops }
}
}
pub struct StageTiming {
pub wall_ns: f64,
pub bit_ops: u64,
}
pub struct FrameResult {
pub wall_ns: f64,
pub bit_ops: u64,
}
impl FrameResult {
fn zero() -> Self {
Self {
wall_ns: 0.0,
bit_ops: 0,
}
}
}
This is the exact Rust file compiled into the mathground_splat WASM bundle.
Every WebGPU call — adapter, buffer, shader module, compute pass, render pass — is
in this file, driven via js_sys::Reflect. The receipt above is a function of this
code, not a bespoke benchmark.
Notes on the receipt
- three receipt kinds:
splat-world-upload(one-shot at mount, bandwidth into GPU memory),splat-world-render(per-frame, project + sort + paint),splat-world-author(one-shot per image upload, model-tier). - artifact = GaussianSplat: the receipt grammar carries the produced artifact's byte count and splat count, so the panel reports joules-per-byte and joules-per-splat alongside the usual envelope.
- why the J / splat is high at first: the first measurable window includes the project pass alone (~1 kbit-ops / splat); paint adds ∼30 kbit-ops / splat once the fragment shader is rasterising. μ usually settles between 109 and 1010.