Integration
Step-by-step guide to integrating the TiGrIS C99 runtime into your embedded application.
1. Add to your build
CMake (as a subdirectory):
add_subdirectory(tigris-runtime)
target_link_libraries(my_app PRIVATE tigris_runtime)Manual: Copy tigris-runtime/src/ and tigris-runtime/include/ into your project. Add the source files to your build system and set the include path.
2. Include headers
#include "tigris.h"
#include "tigris_loader.h"
#include "tigris_mem.h"
#include "tigris_executor.h"
#include "tigris_kernels_s8.h" /* int8 reference backend */
#include "tigris_kernels.h" /* f32 reference backend */Pick the kernel header matching your model’s dtype. For accelerated int8 backends, use tigris_kernels_esp_nn.h (ESP32 family) or tigris_kernels_cmsis_nn.h (Cortex-M family) instead of tigris_kernels_s8.h.
3. Load the plan
tigris_plan_t plan;
tigris_error_t err = tigris_plan_load(plan_buf, plan_buf_len, &plan);plan_buf is the .tgrs file content, either memory-mapped from flash or loaded into a buffer. The loader is zero-copy and zero-alloc: all pointers in plan refer directly into plan_buf. Keep plan_buf alive for the lifetime of plan.
4. Initialize memory
Allocate a fast buffer (SRAM), a slow buffer (PSRAM), and a tensor pointer array. Then initialize the memory manager:
tigris_mem_t mem;
void *tensor_ptrs[plan.header->num_tensors];
tigris_mem_error_t merr = tigris_mem_init(
&mem, (void **)tensor_ptrs, plan.header->num_tensors,
fast_buf, fast_size,
slow_buf, slow_size);The fast buffer size should be at least plan.header->budget. For compressed plans, add the decompression overhead:
uint32_t fast_size = plan.header->budget
+ tigris_weight_decompression_overhead(&plan);5. Prepare backend (ESP-NN only)
#ifdef TIGRIS_HAS_ESP_NN
int ret = tigris_esp_nn_prepare(&plan, &mem);
if (ret != 0) { /* arena too small */ }
#endifThis pre-allocates SIMD scratch buffers from the top of the fast arena. Call once after tigris_mem_init(), before inference.
6. Set model inputs
Allocate input tensors in the slow buffer and fill with your data:
for (uint8_t i = 0; i < plan.header->num_model_inputs; i++) {
uint16_t tidx = plan.model_inputs[i];
uint32_t sz = plan.tensors[tidx].size_bytes;
tigris_mem_alloc_slow(&mem, tidx, sz);
int8_t *input = (int8_t *)mem.tensor_ptrs[tidx];
/* fill input with your preprocessed data */
}7. Run inference
tigris_exec_stats_t stats;
tigris_exec_error_t eerr = tigris_run(
&plan, &mem, tigris_dispatch_kernel_s8, NULL, &stats);Pass the dispatch function for your chosen backend. The user_ctx parameter (NULL above) is forwarded to every kernel call.
8. Read outputs
for (uint8_t i = 0; i < plan.header->num_model_outputs; i++) {
uint16_t tidx = plan.model_outputs[i];
int8_t *output = (int8_t *)mem.tensor_ptrs[tidx];
uint32_t size = plan.tensors[tidx].size_bytes;
/* process output */
}Model outputs are located in the slow buffer after inference completes.
Complete example
Minimal POSIX integration that loads a plan from file, runs inference, and prints first output values. This example uses the int8 reference backend; for f32, replace tigris_dispatch_kernel_s8 with tigris_dispatch_kernel:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "tigris.h"
#include "tigris_loader.h"
#include "tigris_mem.h"
#include "tigris_executor.h"
#include "tigris_kernels_s8.h"
int main(int argc, char **argv) {
/* Load .tgrs file into memory */
FILE *f = fopen(argv[1], "rb");
fseek(f, 0, SEEK_END);
uint32_t file_size = (uint32_t)ftell(f);
fseek(f, 0, SEEK_SET);
uint8_t *plan_buf = malloc(file_size);
fread(plan_buf, 1, file_size, f);
fclose(f);
/* Parse plan (zero-copy into plan_buf) */
tigris_plan_t plan;
tigris_error_t err = tigris_plan_load(plan_buf, file_size, &plan);
if (err != TIGRIS_OK) {
fprintf(stderr, "load failed: %s\n", tigris_error_str(err));
return 1;
}
/* Allocate buffers */
uint32_t fast_size = plan.header->budget
+ tigris_weight_decompression_overhead(&plan);
uint32_t slow_size = 512 * 1024;
void *fast_buf = malloc(fast_size);
void *slow_buf = malloc(slow_size);
void **tensor_ptrs = calloc(plan.header->num_tensors, sizeof(void *));
/* Init memory manager */
tigris_mem_t mem;
tigris_mem_init(&mem, tensor_ptrs, plan.header->num_tensors,
fast_buf, fast_size, slow_buf, slow_size);
/* Allocate and fill model input */
uint16_t in_idx = plan.model_inputs[0];
tigris_mem_alloc_slow(&mem, in_idx, plan.tensors[in_idx].size_bytes);
memset(mem.tensor_ptrs[in_idx], 1, plan.tensors[in_idx].size_bytes);
/* Run inference */
tigris_exec_stats_t stats;
tigris_exec_error_t eerr = tigris_run(
&plan, &mem, tigris_dispatch_kernel_s8, NULL, &stats);
if (eerr != TIGRIS_EXEC_OK) {
fprintf(stderr, "inference failed: %s\n", tigris_exec_error_str(eerr));
return 1;
}
/* Read output */
uint16_t out_idx = plan.model_outputs[0];
int8_t *output = (int8_t *)mem.tensor_ptrs[out_idx];
printf("Output[0..4]: %d %d %d %d %d\n",
output[0], output[1], output[2], output[3], output[4]);
free(tensor_ptrs);
free(slow_buf);
free(fast_buf);
free(plan_buf);
return 0;
}ESP-IDF deployment
On ESP32 targets, store the .tgrs plan on a dedicated flash partition and use esp_partition_mmap() to memory-map it so the loader can reference plan data directly from flash without copying into RAM. Allocate the fast (SRAM) arena with heap_caps_malloc(MALLOC_CAP_INTERNAL) and, if available, allocate the slow buffer from PSRAM with heap_caps_malloc(MALLOC_CAP_SPIRAM). Set the main task stack to at least 64 KB (CONFIG_ESP_MAIN_TASK_STACK_SIZE=65536) because the inference loop and kernel dispatch can be deeply nested.
Error handling
Check every return code. All API functions return typed error enums:
Loader errors (tigris_error_t): See API Reference for the full error enum.
Memory errors (tigris_mem_error_t): See API Reference for the full error enum.
Executor errors (tigris_exec_error_t): See API Reference for the full error enum.
Use tigris_error_str(), tigris_mem_error_str(), and tigris_exec_error_str() to convert error codes to human-readable strings.