Title here
Summary here
Peak working memory, not model size, decides whether inference fits in RAM. It is a property of the schedule and the allocation rather than the model alone, which makes it a compile-time planning problem.
June 11, 2026 in Deep Dives9 minutes