Integration

Step-by-step guide to integrating the TiGrIS C99 runtime into your embedded application.

1. Add to your build

CMake (as a subdirectory):

add_subdirectory(tigris-runtime)
target_link_libraries(my_app PRIVATE tigris_runtime)

Manual: Copy tigris-runtime/src/ and tigris-runtime/include/ into your project. Add the source files to your build system and set the include path.

2. Include headers

#include "tigris.h"
#include "tigris_loader.h"
#include "tigris_mem.h"
#include "tigris_executor.h"
#include "tigris_kernels_s8.h"       /* int8 reference backend */
#include "tigris_kernels.h"          /* f32 reference backend */

Pick the kernel header matching your model’s dtype. For accelerated int8 backends, use tigris_kernels_esp_nn.h (ESP32 family) or tigris_kernels_cmsis_nn.h (Cortex-M family) instead of tigris_kernels_s8.h.

3. Load the plan

tigris_plan_t plan;
tigris_error_t err = tigris_plan_load(plan_buf, plan_buf_len, &plan);

plan_buf is the .tgrs file content, either memory-mapped from flash or loaded into a buffer. The loader is zero-copy and zero-alloc: all pointers in plan refer directly into plan_buf. Keep plan_buf alive for the lifetime of plan.

4. Initialize memory

Allocate a fast buffer (SRAM), a slow buffer (PSRAM), and a tensor pointer array. Then initialize the memory manager:

tigris_mem_t mem;
void *tensor_ptrs[plan.header->num_tensors];

tigris_mem_error_t merr = tigris_mem_init(
    &mem, (void **)tensor_ptrs, plan.header->num_tensors,
    fast_buf, fast_size,
    slow_buf, slow_size);

The fast buffer size should be at least plan.header->budget. For compressed plans, add the decompression overhead:

uint32_t fast_size = plan.header->budget
                   + tigris_weight_decompression_overhead(&plan);

5. Prepare backend (ESP-NN only)

#ifdef TIGRIS_HAS_ESP_NN
int ret = tigris_esp_nn_prepare(&plan, &mem);
if (ret != 0) { /* arena too small */ }
#endif

This pre-allocates SIMD scratch buffers from the top of the fast arena. Call once after tigris_mem_init(), before inference.

6. Set model inputs

Allocate input tensors in the slow buffer and fill with your data:

for (uint8_t i = 0; i < plan.header->num_model_inputs; i++) {
    uint16_t tidx = plan.model_inputs[i];
    uint32_t sz = plan.tensors[tidx].size_bytes;
    tigris_mem_alloc_slow(&mem, tidx, sz);

    int8_t *input = (int8_t *)mem.tensor_ptrs[tidx];
    /* fill input with your preprocessed data */
}

7. Run inference

tigris_exec_stats_t stats;
tigris_exec_error_t eerr = tigris_run(
    &plan, &mem, tigris_dispatch_kernel_s8, NULL, &stats);

Pass the dispatch function for your chosen backend. The user_ctx parameter (NULL above) is forwarded to every kernel call.

8. Read outputs

for (uint8_t i = 0; i < plan.header->num_model_outputs; i++) {
    uint16_t tidx = plan.model_outputs[i];
    int8_t *output = (int8_t *)mem.tensor_ptrs[tidx];
    uint32_t size = plan.tensors[tidx].size_bytes;
    /* process output */
}

Model outputs are located in the slow buffer after inference completes.

Complete example

Minimal POSIX integration that loads a plan from file, runs inference, and prints first output values. This example uses the int8 reference backend; for f32, replace tigris_dispatch_kernel_s8 with tigris_dispatch_kernel:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "tigris.h"
#include "tigris_loader.h"
#include "tigris_mem.h"
#include "tigris_executor.h"
#include "tigris_kernels_s8.h"

int main(int argc, char **argv) {
    /* Load .tgrs file into memory */
    FILE *f = fopen(argv[1], "rb");
    fseek(f, 0, SEEK_END);
    uint32_t file_size = (uint32_t)ftell(f);
    fseek(f, 0, SEEK_SET);
    uint8_t *plan_buf = malloc(file_size);
    fread(plan_buf, 1, file_size, f);
    fclose(f);

    /* Parse plan (zero-copy into plan_buf) */
    tigris_plan_t plan;
    tigris_error_t err = tigris_plan_load(plan_buf, file_size, &plan);
    if (err != TIGRIS_OK) {
        fprintf(stderr, "load failed: %s\n", tigris_error_str(err));
        return 1;
    }

    /* Allocate buffers */
    uint32_t fast_size = plan.header->budget
                       + tigris_weight_decompression_overhead(&plan);
    uint32_t slow_size = 512 * 1024;
    void *fast_buf = malloc(fast_size);
    void *slow_buf = malloc(slow_size);
    void **tensor_ptrs = calloc(plan.header->num_tensors, sizeof(void *));

    /* Init memory manager */
    tigris_mem_t mem;
    tigris_mem_init(&mem, tensor_ptrs, plan.header->num_tensors,
                    fast_buf, fast_size, slow_buf, slow_size);

    /* Allocate and fill model input */
    uint16_t in_idx = plan.model_inputs[0];
    tigris_mem_alloc_slow(&mem, in_idx, plan.tensors[in_idx].size_bytes);
    memset(mem.tensor_ptrs[in_idx], 1, plan.tensors[in_idx].size_bytes);

    /* Run inference */
    tigris_exec_stats_t stats;
    tigris_exec_error_t eerr = tigris_run(
        &plan, &mem, tigris_dispatch_kernel_s8, NULL, &stats);
    if (eerr != TIGRIS_EXEC_OK) {
        fprintf(stderr, "inference failed: %s\n", tigris_exec_error_str(eerr));
        return 1;
    }

    /* Read output */
    uint16_t out_idx = plan.model_outputs[0];
    int8_t *output = (int8_t *)mem.tensor_ptrs[out_idx];
    printf("Output[0..4]: %d %d %d %d %d\n",
           output[0], output[1], output[2], output[3], output[4]);

    free(tensor_ptrs);
    free(slow_buf);
    free(fast_buf);
    free(plan_buf);
    return 0;
}

ESP-IDF deployment

On ESP32 targets, store the .tgrs plan on a dedicated flash partition and use esp_partition_mmap() to memory-map it so the loader can reference plan data directly from flash without copying into RAM. Allocate the fast (SRAM) arena with heap_caps_malloc(MALLOC_CAP_INTERNAL) and, if available, allocate the slow buffer from PSRAM with heap_caps_malloc(MALLOC_CAP_SPIRAM). Set the main task stack to at least 64 KB (CONFIG_ESP_MAIN_TASK_STACK_SIZE=65536) because the inference loop and kernel dispatch can be deeply nested.

Error handling

Check every return code. All API functions return typed error enums:

Loader errors (tigris_error_t): See API Reference for the full error enum.

Memory errors (tigris_mem_error_t): See API Reference for the full error enum.

Executor errors (tigris_exec_error_t): See API Reference for the full error enum.

Use tigris_error_str(), tigris_mem_error_str(), and tigris_exec_error_str() to convert error codes to human-readable strings.