Benchmarks

Benchmark evidence for local sparse updates.

HST is benchmarked as an incremental runtime, not as a generic one-shot sparse multiply. The public summary below describes fixture-backed packs and the measurement rules used during technical evaluation.

Methodology

How we benchmark HST fairly.

We compare HST against exact sparse-delta paths a technical team would actually try first: CSC delta, grouped CSC, and hand-tuned tile traversal where appropriate. Every run is interpreted alongside locality, visited nonzero fraction, error, and fallback behavior.

HTTP timing is useful for customer workflow demos, but hardware counters and energy measurements target the standalone C++ runtime. Publishable reports pin matrix size, seed, update pattern, active columns, tile size, budget, error target, and hardware.

Operating envelope

HST takes the top line when locality repeats.

Production-sized v10 runs show the clean regime: local dirty-tile schedules cross CSC immediately, then keep compounding as the same lookup pattern repeats.

CSC parity 1 dirty tile 4 dirty tiles 8 dirty tiles fallback context

Top line

2.49x

Best measured local schedule at B=16 versus CSC exact delta.

Payoff

1-2

Repeated lookups needed before the local schedules are ahead after build cost.

Sweet spot

1-8

Dirty column tiles with dense local reuse. Outside that shape, route to CSC.

Workload Tiles Breakeven Peak

one local tile11 call2.49x

four local tiles41 call2.36x

eight local tiles81 call2.41x

Fixture-backed packs

Two benchmark packs cover stateful recovery and locality stress.

phi_fast

Stateful Phi-cache skipped-work recovery with persistent recovered output, tile refresh, stale-cache tracking, and error reporting.

sketch_surrogate

Hybrid router and locality stress test using low-rank skipped-delta tile sketches under a configured error target.

Scattered updates

Scattered updates should route to the exact path. The evaluation question is fallback rate and overhead on representative streams, not a universal no-cost promise.

Production context

Real systems rarely scatter uniformly.

Video frames change in localized regions. User activity clusters temporally. Simulation grids perturb around initial conditions. These are the shapes where HST should reduce realized work.

The honest evaluation includes both wins and exits: clustered streams, partly scattered streams, and cases where exact recompute remains the right route. Contact HorneSci for the full benchmark suite, raw CSVs, and customer-specific evaluation package.

Multi-threading

Large batches parallelize cleanly.

When many state vectors update at once, the delta-apply parallelizes by batch with near-linear speedup, and the threading model is formally verified with TLA⁺.

See the concurrency results and a live in-browser demo →

Adaptive heat demo

A million-cell local-update benchmark.

The heat-diffusion demo uses a 1000 x 1000 grid with a moving localized source. It compares full recompute, CSC exact delta, and HST scheduled delta while reporting latency, touched work, memory traffic estimates, and error against the full path.

Open the adaptive heat demo →