tigris compile
Compile an ONNX model into a binary .tgrs execution plan for deployment on embedded devices.
Usage
tigris compile MODEL [OPTIONS]Options
| Flag | Type | Required | Description |
|---|---|---|---|
MODEL | path | yes | ONNX model file (.onnx) |
-m, --mem | size (multiple) | yes | Memory pools, fast to slow (e.g. -m 256K or -m 256K -m 8M) |
-o, --output | path | no | Output .tgrs path (default: MODEL.tgrs) |
-f, --flash | size | no | Flash budget. Warns if plan exceeds this size. |
-c, --compress | none / lz4 | no | Weight compression (default: none) |
--xip | flag | no | Execute-in-place: weights read directly from flash at runtime |
Compilation Pipeline
The compiler runs a 7-stage pipeline:
- Load. Import the ONNX model, fold constants, canonicalize the graph.
- Normalize. Fold QDQ patterns, extract quantization parameters, lower ops to TiGrIS op types.
- Lifetimes. Compute tensor lifetimes from the execution order.
- Memory Timeline. Build the activation memory timeline, compute peak memory.
- Temporal Partition. Split the graph into stages that each fit within the SRAM budget, inserting spill/reload ops at stage boundaries.
- Spatial Partition. For stages that still exceed the budget, compute spatial tiling plans (tile height, halo, receptive field) and detect chain-tileable stage sequences.
- Binary Emit. Serialize to the
.tgrsbinary format.
The output is a single .tgrs file designed for zero-copy, zero-alloc loading on the target device.
XIP (Execute In Place)
When --xip is enabled, the runtime reads weights directly from flash via memory-mapped I/O instead of copying them to RAM. The plan binary format is designed for memory-mapped access, so the C loader returns pointers directly into the mapped buffer. This is the normal mode for embedded deployment.
Weight Compression
LZ4 compression reduces plan size on flash at the cost of a small SRAM overhead for decompression at runtime.
tigris compile model.onnx -m 256K -c lz4 -o model.tgrsWhen compression is enabled:
- Weights are compressed per stage into individual blocks
- At runtime, the executor decompresses one stage’s weights at a time into a reserved prefix of the SRAM arena
- The decompression overhead is
tigris_weight_decompression_overhead(&plan)bytes, which must be added to your fast buffer allocation - The plan reports both compressed and uncompressed sizes
Examples
Compile with a 256K SRAM budget:
tigris compile ds_cnn.onnx -m 256K -o ds_cnn.tgrsOutput:
Binary plan written to ds_cnn.tgrs
28 ops, 3 stages @ 256.00 KiB budget
plan size: 87.42 KiBCompile with LZ4 compression and flash budget check:
tigris compile mobilenetv2.onnx -m 256K -c lz4 -f 4M -o mobilenetv2.tgrsOutput:
Binary plan written to mobilenetv2.tgrs (LZ4 compressed)
53 ops, 12 stages @ 256.00 KiB budget
plan size: 2.85 MiB (uncompressed: 3.41 MiB, ratio: 0.84x)
flash 4.00 MiB: fitsTwo-pool memory (SRAM + PSRAM):
tigris compile yolov5n.onnx -m 232K -m 6M -o yolov5n.tgrs