Use AI models on-device with Ionic, React, and ONNX Runtime

Sep 10, 2025•5 min read

This is a short recap of how the TopVault iOS, Android, and web (SPA, PWA) applications use AI models locally, also known as using them on-device. The examples included in this post are pseudocode abstracted from the actual application to help with demonstration and discussion.

Why is AI needed on-device

The reason an AI model was included on-device for TopVault was to preserve user privacy. TopVault allows collectors to add collectibles based on a photo. Those photos should remain on user's devices because they might contain sensitive and private data. Performing work on-device also means the backend for TopVault requires fewer resources and allows it to scale better with more concurrent photo match requests.

The specific AI model needed was a text extraction model, or optical character recognition (OCR). Initially it supports the english language, and as TopVault grows to support more languages the model should also be capable. For this TopVault uses PaddleOCR, which has support for 80+ languages. I had been using Tesseract but found the quality and accuracy of Paddle to be much higher. In another post I will cover the end to end of how the image matching works.

The important note is that the model is small enough for on-device usage. In this case the detection inference model is 4.5MB and the recognition model is 10MB.

Install dependencies and setup

Instead of using the model directly with ONNX Runtime we use a convenience wrapper dependency that makes it straightforward to use the model on a backend system too. This is helpful in TopVault's case to generate the detection and match data a-priori.

Here are the dependencies needed (and versions used):

$ cat package.json
{
// [...]
  "dependencies": {
    // [...]
    "@gutenye/ocr-browser": "1.4.8",
    "onnxruntime-common": "1.22.0",
    "onnxruntime-web": "1.22.0",
    // [...]
  }
}

Note that the ONNX Runtime version 1.22.0 is used specifically for NodeJS performance and support. At the time this article was published, this version was also considered latest.

The Vite configuration, vite.config.js, is also updated:

$ cat vite.config.js
// [...]
const viteConfig = defineConfig({
    // [...]
    assetsInclude: ["**/*.onnx"],
    optimizeDeps: {
      exclude: ["onnxruntime-web"],
    },
    //[...]
});

The specific .onnx models are extracted from the dependency module and placed into the ./assets directory for the Ionic application.

$ mkdir public/assets/ocr
$ cp node_modules/@gutenye/ocr-models/assets/*.onnx public/assets/ocr

This makes it possible to test the application as you normally would, and build in production mode for distribution without any changes to the build process.

Using ONNX Runtime from Typescript

Assume that the application has @capacitor/camera set up and is handling permission requests and access with a nice user experience.

In the case of @gutenye/ocr-browser it is hiding some ONNX complexity, if using ONNX directly then the model will need to be initialized and inputs converted into a Tensor.

In Pseudocode this is:

import { InferenceSession } from 'onnxruntime-web'
import { Tensor } from 'onnxruntime-common';

async function create(modelPath) {
    return InferenceSession.create(modelPath);
}

async function run(model, inputData, onnxOptions) {
    const input = prepareInput(inputData);
    const outputs = await model.run({
        [model.inputNames[0]]: input,
    }, onnxOptions);
    const output = outputs[model.outputNames[0]];
    return output;
}

function prepareInput(inputData) {
    const input = Float32Array.from(inputData.data);
    return new Tensor('float32', input, [1, 3, inputData.height, inputData.width]);
}

Note that initializing the models, and in the case for OCR and TopVault, it means multiple models, is not instant. On older devices without optimizations this initialization can take over 10 seconds. For using in Ionic and React we need to incorporate this loading time into the user experience.

Set up a React hook for initializing and using the model

A few setup notes:

The asset path requires a leading / in web mode.
The WASM settings need threads disabled.
An exclusive initialization mutex is used for added protection.

A hook is used to simplify usage and expose: modelLoaded, modelError, and infer. Here is another pseudocode example of implementing this hook:

// [...]
const ASSETS_PATH = isWebMode() ? '/assets/ocr' : 'assets/ocr';

// Web and NodeJS WASM only support single-threaded execution.
ort.env.wasm.numThreads = 1;
// Setting the WASM proxy does not work on mobile devices.
// ort.env.wasm.proxy = true;

const ocrLoadedAtom = atom(false);
const ocrErrorAtom = atom<Error | null>(null);
const ocrRuntimeAtom = atom<Ocr | null>(null);

// Only support a single concurrent call into each library function.
const initializeMutex = new Mutex();
const detectMutex = new Mutex();

export function useOcr() {
    const [ocrError, setOcrError] = useAtom(ocrErrorAtom);
    const [ocrLoaded, setOcrLoaded] = useAtom(ocrLoadedAtom);
    const [ocrRuntime, setOcrRuntime] = useAtom(ocrRuntimeAtom);

    const waitForOcr = useCallback(async () => {
        await initializeMutex.runExclusive(async () => {
            if (ocrRuntime) {
                return;
            }
            try {
                const model = await create(`${ASSETS_PATH}/ch_PP-OCRv4_det_infer.onnx`);
                setOcrLoaded(true);
                setOcrRuntime(model);
            } catch (e: unknown) {
                const error = e as Error;
                setOcrError(error);
            }
        });
    }, [ocrRuntime, setOcrError, setOcrLoaded, setOcrRuntime]);

    useEffect(() => {
        void waitForOcr();
    }, [waitForOcr]);

    const infer = useCallback(
        async (dataUri: string) => {
            if (!ocrRuntime) {
                return [];
            }
            return detectMutex.runExclusive(async () => {
                return run(ocrRuntime, dataUri, {});
            });
        },
        [ocrRuntime]
    );

    return { ocrLoaded, ocrError, infer };
}

Another note about (3) above. This mutex may not be needed, but this is a practice I implement when dealing with complex initialization within library methods.

Using the model in a component

In the React components we can make good use of Ionic components based on the fields exposed by the hook above. Consider the example here:

// [...]
export function FindByPhotoModal(props) {
    const { ocrLoaded, ocrError, infer } = useOcr();

    const onPickPhoto = (/* ... */) => {
        // Open Camera prompt to select photo.
        // Extract the dataUri from what was selected.
        await infer(dataUri);
    };

    return (
        <IonPage>
            <IonHeader>
                <IonToolbar />
                <IonProgressBar
                    style={{ visibility: ocrLoaded || ocrError ? 'hidden' : '' }}
                    type="indeterminate"
                />
            </IonHeader>
            <IonContent>
                <IonButton onClick={onPickPhoto} />
            </IonContent>
        </IonPage>
    );
}

The final result can be easily integrated with hooks that check for camera and photo selection permissions, and be reactive when the model finishes parsing. This creates a user experience where the (a) the model loading, (b) the model running, and (c) the backend matching are all clearly communicated steps.

Here is a short screen capture of putting all the steps together:

Animation showing a final result of on-device text extraction.