Title here
Summary here
Peak working memory, not model size, decides whether inference fits in RAM. It is a property of the schedule and the allocation rather than the model alone, which makes it a compile-time planning problem.
June 11, 2026 in Deep Dives9 minutes
The model of your choice doesn't fit in SRAM. You consider reducing its size or using a different model. TiGrIS tiles the computation instead and makes the exact model run on your target hardware.
April 23, 2026 in Announcements12 minutes