Top line
2.49xBest measured local schedule at B=16 versus CSC exact delta.
Benchmarks
HST is benchmarked as an incremental runtime, not as a generic one-shot sparse multiply. The public summary below describes fixture-backed packs and the measurement rules used during technical evaluation.
Methodology
We compare HST against exact sparse-delta paths a technical team would actually try first: CSC delta, grouped CSC, and hand-tuned tile traversal where appropriate. Every run is interpreted alongside locality, visited nonzero fraction, error, and fallback behavior.
HTTP timing is useful for customer workflow demos, but hardware counters and energy measurements target the standalone C++ runtime. Publishable reports pin matrix size, seed, update pattern, active columns, tile size, budget, error target, and hardware.
Operating envelope
Production-sized v10 runs show the clean regime: local dirty-tile schedules cross CSC immediately, then keep compounding as the same lookup pattern repeats.
Top line
2.49xBest measured local schedule at B=16 versus CSC exact delta.
Payoff
1-2Repeated lookups needed before the local schedules are ahead after build cost.
Sweet spot
1-8Dirty column tiles with dense local reuse. Outside that shape, route to CSC.
Fixture-backed packs
Stateful Phi-cache skipped-work recovery with persistent recovered output, tile refresh, stale-cache tracking, and error reporting.
Hybrid router and locality stress test using low-rank skipped-delta tile sketches under a configured error target.
Scattered updates should route to the exact path. The evaluation question is fallback rate and overhead on representative streams, not a universal no-cost promise.
Production context
Video frames change in localized regions. User activity clusters temporally. Simulation grids perturb around initial conditions. These are the shapes where HST should reduce realized work.
The honest evaluation includes both wins and exits: clustered streams, partly scattered streams, and cases where exact recompute remains the right route. Contact HorneSci for the full benchmark suite, raw CSVs, and customer-specific evaluation package.
Multi-threading
When many state vectors update at once, the delta-apply parallelizes by batch with near-linear speedup, and the threading model is formally verified with TLA+.
Adaptive heat demo
The heat-diffusion demo uses a 1000 x 1000 grid with a moving localized source. It compares full recompute, CSC exact delta, and HST scheduled delta while reporting latency, touched work, memory traffic estimates, and error against the full path.